Front page | perl.perl5.porters |
Postings from March 2007
[perl #42102] unpack use internal string representation (utf8)
From:
powerman @ powerman . asdfGroup . com
Date:
March 26, 2007 17:04
Subject:
[perl #42102] unpack use internal string representation (utf8)
Message ID:
rt-3.6.HEAD-30201-1174941277-750.42102-75-0@perl.org
# New Ticket Created by powerman@powerman.asdfGroup.com
# Please include the string: [perl #42102]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=42102 >
This is a bug report for perl from powerman@powerman.asdfGroup.com,
generated with the help of perlbug 1.35 running under perl v5.8.8.
-----------------------------------------------------------------
[Please enter your report here]
pcalc> $s1 = $s = "\xAA\xBB\xCC"; utf8::upgrade $s1
pcalc> x map {sprintf "%x", $_} unpack "CCC", $s
$VAR1 = 'aa';
$VAR2 = 'bb';
$VAR3 = 'cc';
pcalc> x map {sprintf "%x", $_} unpack "CCC", $s1
$VAR1 = 'c2';
$VAR2 = 'aa';
$VAR3 = 'c2';
pcalc> x map {sprintf "%x", $_} unpack "n", $s
$VAR1 = 'aabb';
pcalc> x map {sprintf "%x", $_} unpack "n", $s1
$VAR1 = 'c2aa';
Actually I got this issue by using JSON::XS and Compress::Zlib.
I've received HTTP reply from web server, packed it into JSON,
transfer to another part of my application, it unpack from JSON
(at this point my bytes become marked 'UTF8' as Devel::Peek show)
and Compress::Zlib fail to ungzip this HTTP reply because it use:
sub _removeGzipHeader
...
unpack ('CCCCVCC', $$string);
JSON::XS author say it's bug in perl and ask me to send bugreport.
pcalc> $s = "\xAA\xBB\xCC"; $s1=from_json(to_json([$s]))->[0];
pcalc> Dump $s
SV = PV(0x1040cc40) at 0x1051c060
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x105e04d8 "\252\273\314"\0
CUR = 3
LEN = 4
pcalc> Dump $s1
SV = PVMG(0x1051c4e8) at 0x1051c084
REFCNT = 1
FLAGS = (SMG,POK,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x105e7820 "\302\252\302\273\303\214"\0 [UTF8 "\x{aa}\x{bb}\x{cc}"]
CUR = 6
LEN = 8
MAGIC = 0x105e7910
MG_VIRTUAL = &PL_vtbl_utf8
MG_TYPE = PERL_MAGIC_utf8(w)
MG_LEN = 3
pcalc> x map {sprintf "%o", $_} unpack "CCC", $s
$VAR1 = '252';
$VAR2 = '273';
$VAR3 = '314';
pcalc> x map {sprintf "%o", $_} unpack "CCC", $s1
$VAR1 = '302';
$VAR2 = '252';
$VAR3 = '302';
pcalc> utf8::downgrade $s1
pcalc> Dump $s1
SV = PV(0x10d389b0) at 0x10e47e24
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x10f12e90 "\252\273\314"\0
CUR = 3
LEN = 8
pcalc> x map {sprintf "%o", $_} unpack "CCC", $s1
$VAR1 = '252';
$VAR2 = '273';
$VAR3 = '314';
Right now I'm using utf8::downgrade as workaround...
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=high
---
Site configuration information for perl v5.8.8:
Configured by Gentoo at Mon Oct 30 06:06:29 EET 2006.
Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=linux, osvers=2.6.16-hardened-r11, archname=i686-linux
uname='linux home 2.6.16-hardened-r11 #9 smp mon oct 30 04:43:33 eet 2006 i686 intel(r) core(tm)2 cpu 6600 @ 2.40ghz gnulinux '
config_args='-des -Darchname=i686-linux -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=i686-pc-linux-gnu-gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth= -Doptimize=-march=pentium-m -msse3 -O2 -pipe -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux 5.8.2 5.8.2/i686-linux 5.8.4 5.8.4/i686-linux 5.8.5 5.8.5/i686-linux 5.8.6 5.8.6/i686-linux 5.8.7 5.8.7/i686-linux -Dcf_by=Gentoo -Ud_csh -Dusenm -Di_ndbm -Di_gdbm -Di_db'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='i686-pc-linux-gnu-gcc', ccflags ='-fno-strict-aliasing -pipe -Wdeclaration-after-statement -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-march=pentium-m -msse3 -O2 -pipe',
cppflags='-fno-strict-aliasing -pipe -Wdeclaration-after-statement'
ccversion='', gccversion='3.4.6 (Gentoo Hardened 3.4.6-r1, ssp-3.4.5-1.0, pie-8.7.9)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='i686-pc-linux-gnu-gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.3.6.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.3.6'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.8:
/etc/perl
/usr/lib/perl5/vendor_perl/5.8.8/i686-linux
/usr/lib/perl5/vendor_perl/5.8.8
/usr/lib/perl5/vendor_perl
/usr/lib/perl5/site_perl/5.8.8/i686-linux
/usr/lib/perl5/site_perl/5.8.8
/usr/lib/perl5/site_perl
/usr/lib/perl5/5.8.8/i686-linux
/usr/lib/perl5/5.8.8
/usr/local/lib/site_perl
.
---
Environment for perl v5.8.8:
HOME=/home/powerman
LANG=ru_RU.KOI8-R
LANGUAGE (unset)
LC_NUMERIC=POSIX
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/powerman/bin:/home/powerman/inferno-os/Linux/386/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/sbin:/usr/sbin:/usr/local/sbin:/usr/games/bin:/usr/i686-pc-linux-gnu/gcc-bin/3.4.6:/opt/sun-jdk-1.4.2.13/bin:/opt/sun-jdk-1.4.2.13/jre/bin:/opt/sun-jdk-1.4.2.13/jre/javaws:/usr/kde/3.5/bin:/usr/qt/3/bin:/usr/games/bin:/opt/vmware/workstation/bin:/var/qmail/bin
PERL_BADLANG (unset)
SHELL=/bin/bash
-
[perl #42102] unpack use internal string representation (utf8)
by powerman @ powerman . asdfGroup . com