On Thu, Sep 2, 2021 at 9:53 AM demerphq <demerphq@gmail.com> wrote: > Having said that I have seen a lot of people for one reason or another get > encoding wrong in various ways, especially with MySQL or other over-wire > situations. Double encoding errors are common (eg where people accidentally > upgrade already encoded but flag-off utf8 data). At work we have a function > called recurse_decode_utf8() which takes a string and does its best to > "reduce" it to its minimal form by repeatedly turning off the utf8 flag, > and then executing decode_utf8() on the string and then downgrade until the > decode operation throws an error. Widespread use of this function o string > data almost completely eliminated all of our utf8 problems. (Ill post the > code in another mail.) > If it works for this case fine, but please do not suggest this for general use. This is guessing, and results in decoding strings which were already characters (false positives), because there is no way to differentiate a valid string of UTF-8 bytes from a string of characters whose ordinals happen to form a valid UTF-8 byte sequence. The correct solution is to fix your double encoding, and always decode a string the exact number of times it was encoded. The use of the utf8 flag to "decode" is an unrelated problem. -DanThread Previous | Thread Next