On 7 June 2012 19:33, Jesse Luehrs <doy@tozt.net> wrote: > On Thu, Jun 07, 2012 at 07:16:15PM +0200, demerphq wrote: >> On 7 June 2012 17:33, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote: >> > * Jim Avera <perlbug-followup@perl.org> [2012-05-26 03:10]: >> >> However it seems wrong to test for #chars != #bytes, because binary >> >> data _should_ be passed as byte strings, that is, with Perl's internal >> >> utf8 flag off. >> > >> > Disagree. >> > >> > The UTF8 flag is completely irrelevant to a string’s semantics. >> >> Please stop saying this. It is the same flawed logic that means I cant >> send a bitvector in JSON reliably, which is a problem we DO NOT WANT >> in Perl. >> >> It is simply not true. If a string contains binary data then it is >> binary, and treating it as utf8 in any form is completely and utterly >> wrong. > > But it is true. I don't really see how what you said contradicts what > Aristotle said. If a binary string happens to contain all bytes less > than 0x7f, then whether the UTF8 flag is on or off is irrelevant - perl > will treat them the same, and application code should treat them the > same as well. You're conflating the way that perl stores the string data > internally (which is what the UTF8 flag represents) with what the data > actually represents (which is a string of characters, which could be > interpreted as a byte string if all of the characters are less than or > equal to 0xff). A string containing binary data could easily have the > UTF8 flag on without changing its meaning, because the UTF8 flag has no > relevance to the semantic meaning of the data. Some strings contain binary data, such as structs intended to passed into C code, the result of pack, or vec, if you treat them as utf8 you either have broken utf8 (such as via vec() iirc), or you have broken binary data. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next