develooper Front page | perl.perl5.porters | Postings from June 2012

Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]

Thread Previous | Thread Next
June 7, 2012 16:53
Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]
Message ID:
On 7 June 2012 17:33, Aristotle Pagaltzis <> wrote:
> * Jim Avera <> [2012-05-26 03:10]:
>> However it seems wrong to test for #chars != #bytes, because binary
>> data _should_ be passed as byte strings, that is, with Perl's internal
>> utf8 flag off.
> Disagree.

And I agree with you on this, although for different reasons than you.
My previous reply was only about this:

> The UTF8 flag is completely irrelevant to a string’s semantics. Wherever
> it’s treated as meaningful, that is a bug that should be fixed.

The UTF8 flag means I cant twiddle bits in the string. If the string
is utf8 on I cannot, at least not without potentially creating broken

And if a binary string is "upgraded" you get a mess. (Can you tell I am bitter?)

> So it
> seems to me at first sight that the string should just reach the fast
> exit check untouched and be left for the remaining code to deal with.
> But on closer read I get a vague impression that the intent of the code
> in the whole function is based on confused notions about encodings. And
> that it therefore possibly should be done over entirely. I am not yet
> sure exactly what it is trying to achieve, though.

I am pretty sure the idea is that we use \x{..} escapes for codepoints
80 and up when the string is utf8. Octal is used otherwise.

This is actually significant data, it means you can round trip a raw
string of bytes and have them come back with utf8 flag off, or send
the same "chars" as utf8. (Although whether this actually works via
*eval* is another question.)

> As an irrelevant aside,
>>    s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge if $bytes > length;
> … it’s a mystery to me why the replacement expression was spelled
>    '\x{'.sprintf('%x',...).'}'
> instead of simply
>    sprintf('\x{%x}',...)
> and similarly for several other substitutions within the function.

Yes indeed i was thinking the same thing. Sorry about the tone of my
previous mail, I shouldn't have replied without saying more.


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About