develooper Front page | perl.perl5.porters | Postings from June 2012

Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]

Thread Previous | Thread Next
From:
demerphq
Date:
June 7, 2012 16:53
Subject:
Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]
Message ID:
CANgJU+V5+-niW1gbU+wcPzwGCZ8-tfApwwtANzADU-1jrf1cPQ@mail.gmail.com
On 7 June 2012 17:33, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote:
> * Jim Avera <perlbug-followup@perl.org> [2012-05-26 03:10]:
>> However it seems wrong to test for #chars != #bytes, because binary
>> data _should_ be passed as byte strings, that is, with Perl's internal
>> utf8 flag off.
>
> Disagree.

And I agree with you on this, although for different reasons than you.
My previous reply was only about this:

> The UTF8 flag is completely irrelevant to a string’s semantics. Wherever
> it’s treated as meaningful, that is a bug that should be fixed.

The UTF8 flag means I cant twiddle bits in the string. If the string
is utf8 on I cannot, at least not without potentially creating broken
utf8,

And if a binary string is "upgraded" you get a mess. (Can you tell I am bitter?)

> So it
> seems to me at first sight that the string should just reach the fast
> exit check untouched and be left for the remaining code to deal with.
>
> But on closer read I get a vague impression that the intent of the code
> in the whole function is based on confused notions about encodings. And
> that it therefore possibly should be done over entirely. I am not yet
> sure exactly what it is trying to achieve, though.

I am pretty sure the idea is that we use \x{..} escapes for codepoints
80 and up when the string is utf8. Octal is used otherwise.

This is actually significant data, it means you can round trip a raw
string of bytes and have them come back with utf8 flag off, or send
the same "chars" as utf8. (Although whether this actually works via
*eval* is another question.)

>
> As an irrelevant aside,
>
>>    s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge if $bytes > length;
>
> … it’s a mystery to me why the replacement expression was spelled
>
>    '\x{'.sprintf('%x',...).'}'
>
> instead of simply
>
>    sprintf('\x{%x}',...)
>
> and similarly for several other substitutions within the function.

Yes indeed i was thinking the same thing. Sorry about the tone of my
previous mail, I shouldn't have replied without saying more.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About