Front page | perl.perl5.porters |
Postings from May 2012
[perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]
Thread Next
From:
Jim Avera
Date:
May 25, 2012 18:07
Subject:
[perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]
Message ID:
rt-3.6.HEAD-7788-1337994418-1618.113088-75-0@perl.org
# New Ticket Created by Jim Avera
# Please include the string: [perl #113088]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=113088 >
This is a bug report for perl from james_avera@yahoo.com,
generated with the help of perlbug 1.39 running under perl 5.12.4.
-----------------------------------------------------------------
Data::Dumper contains support for encoding non-ASCII characters
as themselves, not \x{...} escapes. This is controlled by setting
Useqq() to one of the special values 'iso8859', 'utf8', or '8bit'.
The code is commented as "not supported...SUBJECT TO CHANGE". Fair
enough. But it's currently completely broken, and I think the fix is
simple (patch below).
Early in sub qquote() there is the following:
my $bytes; { use bytes; $bytes = length }
s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge if $bytes > length;
This removes all wide characters and the upper half of the
single-octet range before reaching the encoding-support code.
Therefore, the encoding-support can't do anything useful; all the
"interesting" characters have already been converted to \x{...} escapes.
I suspect those lines were added to speed up dumping of huge binary
blobs which are not really printable strings. However it seems wrong to
test for #chars != #bytes, because binary data _should_ be passed
as byte strings, that is, with Perl's internal utf8 flag off.
In that case #chars===#bytes and the optimization would not happen anyway.
So I'd like to propose to fix this by changing the above code to
s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge
unless utf8::is_utf8($_);
This will make the "fast exit" occur for
. character strings which contain only ASCII characters
. binary strings with no values below \x20 (space)
Strings with non-ASCII characters (or bytes < \x20, if a binary string)
will fall through to the encoding-support code.
Here is a patch, followed by a test program (sorry about line-wraps;
Thunderbird is not nice...):
--- Data/Dumper.pm.ORIG 2012-05-25 19:06:41.775175838 +0000
+++ Data/Dumper.pm 2012-05-26 00:43:04.632097843 +0000
@@ -680,8 +680,9 @@
sub qquote {
local($_) = shift;
s/([\\\"\@\$])/\\$1/g;
- my $bytes; { use bytes; $bytes = length }
- s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge if $bytes > length;
+ { use utf8;
+ s/([^\x00-\x7f])/'\x{'.sprintf("%x",ord($1)).'}'/ge unless
utf8::is_utf8($_);
+ }
return qq("$_") unless
/[^ !"\#\$%&'()*+,\-.\/0-9:;<=>?\@A-Z[\\\]^_`a-z{|}~]/; # fast exit
#!/usr/bin/perl
use strict; use warnings;
use utf8;
use Encode ();
use Data::Dumper;
binmode STDOUT, 'encoding(utf-8)';
my $chars = "Hello world \N{U+263A} \x{7F}\n"; # smiley DEL
my $octets = Encode::encode('utf-8', $chars);
utf8::is_utf8($chars) && print "chars is_utf8\n";
utf8::is_utf8($octets) && print "octets is_utf8\n";
print "length(chars)=", length($chars), "\n";
print "length(octets)=", length($octets), "\n";
print Data::Dumper->new([$octets],['*octets'])->Useqq(1)->Dump;
print Data::Dumper->new([$chars],['*chars'])->Useqq(1)->Dump;
print Data::Dumper->new([$octets],['*octets'])->Useqq('utf8')->Dump;
print Data::Dumper->new([$chars],['*chars'])->Useqq('utf8')->Dump;
exit 0;
# ---OUTPUT---
# chars is_utf8
# length(chars)=16
# length(octets)=18
# $octets = "Hello world \x{e2}\x{98}\x{ba} \177\n";
# $chars = "Hello world \x{263a} \177\n";
# $octets = "Hello world \x{e2}\x{98}\x{ba} \177\n";
# $chars = "Hello world ☺ \177\n"; # need a utf-8 terminal to see this
-----------------------------------------------------------------
---
Flags:
category=library
severity=low
module=Data::Dumper
---
Site configuration information for perl 5.12.4:
Configured by Debian Project at Tue Sep 6 08:08:24 UTC 2011.
Summary of my perl5 (revision 5 version 12 subversion 4) configuration:
Platform:
osname=linux, osvers=2.6.24-28-server,
archname=x86_64-linux-gnu-thread-multi
uname='linux allspice 2.6.24-28-server #1 smp wed aug 18 21:17:51
utc 2010 x86_64 x86_64 x86_64 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.12 -Darchlib=/usr/lib/perl/5.12
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl/5.12.4
-Dsitearch=/usr/local/lib/perl/5.12.4 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1
-Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm
-Uusesfio -Uusenm -Ui_libutil -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib
-Dlibperl=libperl.so.5.12.4 -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=define, use64bitall=define, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN
-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing
-pipe -fstack-protector -I/usr/local/include'
ccversion='', gccversion='4.6.1', gccosandvers=''
intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
libpth=/usr/local/lib /lib/x86_64-linux-gnu /lib/../lib
/usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib /usr/lib
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=, so=so, useshrplib=true, libperl=libperl.so.5.12.4
gnulibc_version='2.13'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib
-fstack-protector'
Locally applied patches:
---
@INC for perl 5.12.4:
/home/jima/local/share/perl/5.12.4
/home/jima/local/share/perl
/home/jima/lib/perl
/etc/perl
/usr/local/lib/perl/5.12.4
/usr/local/share/perl/5.12.4
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.12
/usr/share/perl/5.12
/usr/local/lib/site_perl
.
---
Environment for perl 5.12.4:
HOME=/home/jima
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH=/home/jima/local/lib
LOGDIR (unset)
PATH=/home/jima/bin:/home/jima/local/bin:/home/jima/jima_tools/x86_64/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/bin/X11:/usr/local/bin:/opt/openoffice.org3/program:/usr/local/games:/usr/games:.
PERL5LIB=/home/jima/local/share/perl:/home/jima/lib/perl
PERL_BADLANG (unset)
SHELL=/bin/bash
Thread Next
-
[perl #113088] Data::Dumper::Useqq('utf8') broken [PATCH]
by Jim Avera