On Thu, Jun 07, 2012 at 07:16:15PM +0200, demerphq wrote: > On 7 June 2012 17:33, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote: > > * Jim Avera <perlbug-followup@perl.org> [2012-05-26 03:10]: > >> However it seems wrong to test for #chars != #bytes, because binary > >> data _should_ be passed as byte strings, that is, with Perl's internal > >> utf8 flag off. > > > > Disagree. > > > > The UTF8 flag is completely irrelevant to a string’s semantics. > > Please stop saying this. It is the same flawed logic that means I cant > send a bitvector in JSON reliably, which is a problem we DO NOT WANT > in Perl. > > It is simply not true. If a string contains binary data then it is > binary, and treating it as utf8 in any form is completely and utterly > wrong. But it is true. I don't really see how what you said contradicts what Aristotle said. If a binary string happens to contain all bytes less than 0x7f, then whether the UTF8 flag is on or off is irrelevant - perl will treat them the same, and application code should treat them the same as well. You're conflating the way that perl stores the string data internally (which is what the UTF8 flag represents) with what the data actually represents (which is a string of characters, which could be interpreted as a byte string if all of the characters are less than or equal to 0xff). A string containing binary data could easily have the UTF8 flag on without changing its meaning, because the UTF8 flag has no relevance to the semantic meaning of the data. -doyThread Previous | Thread Next