On 26 May 2012 03:06, Jim Avera <perlbug-followup@perl.org> wrote: > # New Ticket Created by Jim Avera > # Please include the string: [perl #113088] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=113088 > > > > This is a bug report for perl from james_avera@yahoo.com, > generated with the help of perlbug 1.39 running under perl 5.12.4. > > ----------------------------------------------------------------- > Data::Dumper contains support for encoding non-ASCII characters > as themselves, not \x{...} escapes. This is controlled by setting > Useqq() to one of the special values 'iso8859', 'utf8', or '8bit'. > > The code is commented as "not supported...SUBJECT TO CHANGE". Fair > enough. But it's currently completely broken, and I think the fix is > simple (patch below). > > Early in sub qquote() there is the following: > > my $bytes; { use bytes; $bytes = length } > s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge if $bytes > length; > > This removes all wide characters and the upper half of the > single-octet range before reaching the encoding-support code. > Therefore, the encoding-support can't do anything useful; all the > "interesting" characters have already been converted to \x{...} escapes. > > I suspect those lines were added to speed up dumping of huge binary > blobs which are not really printable strings. However it seems wrong to > test for #chars != #bytes, because binary data _should_ be passed > as byte strings, that is, with Perl's internal utf8 flag off. > In that case #chars===#bytes and the optimization would not happen anyway. They are only converted if the string is utf8. I think this is an attempt to preserve Unicode semantics on the string after serialization. I believe that the idea is that \x{..} produces a unicode codepoint, although whether it actually does in all perls is another matter. Data::Undump however *will* however respect this. > So I'd like to propose to fix this by changing the above code to > > s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge > unless utf8::is_utf8($_); No, I really dont think this is a good idea. > This will make the "fast exit" occur for > . character strings which contain only ASCII characters > . binary strings with no values below \x20 (space) > > Strings with non-ASCII characters (or bytes < \x20, if a binary string) > will fall through to the encoding-support code. I dont think this is the right fix. IMO the right fix is to use a different routine than qquote() to handle strings for alternate encodings. Try using $Useqq and then doing something like local *qquote = sub { ... }; before you call Data::Dumper. Not sure if you can override key quoting as easily. cheers Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next