develooper Front page | perl.perl5.porters | Postings from September 2021

Re: Pre-RFC: Rename SVf_UTF8 et al.

Thread Previous | Thread Next
From:
Dan Book
Date:
September 2, 2021 14:45
Subject:
Re: Pre-RFC: Rename SVf_UTF8 et al.
Message ID:
CABMkAVWhAMGqrUP5Vpku5XNt=NpJxw4XapWhzP_D1Pn7=JGowQ@mail.gmail.com
On Thu, Sep 2, 2021 at 9:53 AM demerphq <demerphq@gmail.com> wrote:

> Having said that I have seen a lot of people for one reason or another get
> encoding wrong in various ways, especially with MySQL or other over-wire
> situations. Double encoding errors are common (eg where people accidentally
> upgrade already encoded but flag-off utf8 data). At work we have a function
> called recurse_decode_utf8() which takes a string and does its best to
> "reduce" it to its minimal form by repeatedly turning off the utf8 flag,
> and then executing decode_utf8() on the string and then downgrade until the
> decode operation throws an error. Widespread use of this function o string
> data almost completely eliminated all of our utf8 problems. (Ill post the
> code in another mail.)
>

If it works for this case fine, but please do not suggest this for general
use. This is guessing, and results in decoding strings which were already
characters (false positives), because there is no way to differentiate a
valid string of UTF-8 bytes from a string of characters whose ordinals
happen to form a valid UTF-8 byte sequence. The correct solution is to fix
your double encoding, and always decode a string the exact number of times
it was encoded. The use of the utf8 flag to "decode" is an unrelated
problem.

-Dan

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About