develooper Front page | perl.perl5.porters | Postings from June 2012

Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]

Thread Previous | Thread Next
June 7, 2012 15:45
Re: [perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]
Message ID:
On 26 May 2012 03:06, Jim Avera <> wrote:
> # New Ticket Created by  Jim Avera
> # Please include the string:  [perl #113088]
> # in the subject line of all future correspondence about this issue.
> # <URL: >
> This is a bug report for perl from,
> generated with the help of perlbug 1.39 running under perl 5.12.4.
> -----------------------------------------------------------------
> Data::Dumper contains support for encoding non-ASCII characters
> as themselves, not \x{...} escapes.  This is controlled by setting
> Useqq() to one of the special values 'iso8859', 'utf8', or '8bit'.
> The code is commented as "not supported...SUBJECT TO CHANGE".  Fair
> enough. But it's currently completely broken, and I think the fix is
> simple (patch below).
> Early in sub qquote() there is the following:
>   my $bytes; { use bytes; $bytes = length }
>   s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge if $bytes > length;
> This removes all wide characters and the upper half of the
> single-octet range before reaching the encoding-support code.
> Therefore, the encoding-support can't do anything useful; all the
> "interesting" characters have already been converted to \x{...} escapes.
> I suspect those lines were added to speed up dumping of huge binary
> blobs which are not really printable strings.  However it seems wrong to
> test for #chars != #bytes, because binary data _should_ be passed
> as byte strings, that is, with Perl's internal utf8 flag off.
> In that case #chars===#bytes and the optimization would not happen anyway.

They are only converted if the string is utf8.

I think this is an attempt to preserve Unicode semantics on the string
after serialization. I believe that the idea is that \x{..} produces a
unicode codepoint, although whether it actually does in all perls is
another matter. Data::Undump however *will* however respect this.

> So I'd like to propose to fix this by changing the above code to
>   s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge
>     unless utf8::is_utf8($_);

No, I really dont think this is a good idea.

> This will make the "fast exit" occur for
>   . character strings which contain only ASCII characters
>   . binary strings with no values below \x20 (space)
> Strings with non-ASCII characters (or bytes < \x20, if a binary string)
> will fall through to the encoding-support code.

I dont think this is the right fix. IMO the right fix is to use a
different routine than qquote() to handle strings for alternate

Try using $Useqq and then doing something like

local *qquote = sub { ... };

before you call Data::Dumper. Not sure if you can override key quoting
as easily.


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About