develooper Front page | perl.perl5.porters | Postings from June 2012

Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]

Thread Previous | Thread Next
From:
demerphq
Date:
June 7, 2012 16:54
Subject:
Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]
Message ID:
CANgJU+U17ReDACMzgqDD+Ye5nZN67vZS5Pk_Vux473RbcEPhEg@mail.gmail.com
On 7 June 2012 19:33, Jesse Luehrs <doy@tozt.net> wrote:
> On Thu, Jun 07, 2012 at 07:16:15PM +0200, demerphq wrote:
>> On 7 June 2012 17:33, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote:
>> > * Jim Avera <perlbug-followup@perl.org> [2012-05-26 03:10]:
>> >> However it seems wrong to test for #chars != #bytes, because binary
>> >> data _should_ be passed as byte strings, that is, with Perl's internal
>> >> utf8 flag off.
>> >
>> > Disagree.
>> >
>> > The UTF8 flag is completely irrelevant to a string’s semantics.
>>
>> Please stop saying this. It is the same flawed logic that means I cant
>> send a bitvector in JSON reliably, which is a problem we DO NOT WANT
>> in Perl.
>>
>> It is simply not true. If a string contains binary data then it is
>> binary, and treating it as utf8 in any form is completely and utterly
>> wrong.
>
> But it is true. I don't really see how what you said contradicts what
> Aristotle said. If a binary string happens to contain all bytes less
> than 0x7f, then whether the UTF8 flag is on or off is irrelevant - perl
> will treat them the same, and application code should treat them the
> same as well. You're conflating the way that perl stores the string data
> internally (which is what the UTF8 flag represents) with what the data
> actually represents (which is a string of characters, which could be
> interpreted as a byte string if all of the characters are less than or
> equal to 0xff). A string containing binary data could easily have the
> UTF8 flag on without changing its meaning, because the UTF8 flag has no
> relevance to the semantic meaning of the data.

Some strings contain binary data, such as structs intended to passed
into C code, the result of pack, or vec, if you treat them as utf8 you
either have broken utf8 (such as via vec() iirc), or you have broken
binary data.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About