develooper Front page | perl.perl5.porters | Postings from June 2012

Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]

Thread Previous | Thread Next
From:
Jesse Luehrs
Date:
June 7, 2012 10:33
Subject:
Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]
Message ID:
20120607173307.GL5599@tozt.net
On Thu, Jun 07, 2012 at 07:16:15PM +0200, demerphq wrote:
> On 7 June 2012 17:33, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote:
> > * Jim Avera <perlbug-followup@perl.org> [2012-05-26 03:10]:
> >> However it seems wrong to test for #chars != #bytes, because binary
> >> data _should_ be passed as byte strings, that is, with Perl's internal
> >> utf8 flag off.
> >
> > Disagree.
> >
> > The UTF8 flag is completely irrelevant to a string’s semantics.
> 
> Please stop saying this. It is the same flawed logic that means I cant
> send a bitvector in JSON reliably, which is a problem we DO NOT WANT
> in Perl.
> 
> It is simply not true. If a string contains binary data then it is
> binary, and treating it as utf8 in any form is completely and utterly
> wrong.

But it is true. I don't really see how what you said contradicts what
Aristotle said. If a binary string happens to contain all bytes less
than 0x7f, then whether the UTF8 flag is on or off is irrelevant - perl
will treat them the same, and application code should treat them the
same as well. You're conflating the way that perl stores the string data
internally (which is what the UTF8 flag represents) with what the data
actually represents (which is a string of characters, which could be
interpreted as a byte string if all of the characters are less than or
equal to 0xff). A string containing binary data could easily have the
UTF8 flag on without changing its meaning, because the UTF8 flag has no
relevance to the semantic meaning of the data.

-doy

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About