[perl #33734] unpack fails on utf-8 strings

Marc Lehmann
January 9, 2005 15:45
[perl #33734] unpack fails on utf-8 strings
# New Ticket Created by  Marc Lehmann 
# Please include the string:  [perl #33734]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.35 running under perl v5.8.6.

[Please enter your report here]

The following program should output "65535" twice, but doesn't:

   use Convert::Scalar;
   my $s = "\xff\xff";
   printf "%d\n", unpack "n", $s;
   Convert::Scalar::utf8_upgrade $s;
   printf "%d\n", unpack "n", $s;

The program creates the string "\xff\xff" and runs it through unpack, once
when the string internally is in latin1, once when the string is in utf-8
(Convert::Scalar::utf8_upgrade just runs utf8_upgrade on the string).

The result must be the same in both cases (same string content), but the
second print gives "50111".

As the internal encoding (wether latin1 or utf8) does NOT change the
string on the perl level, unpack must work consistently.

(I found this bug because for some reason perl upgraded my string to
utf-8 internally, causing very funny effects when I ran various unpacks
to decode the protocol. As perl can do that in various unexpected ways,
I chose severity "high" because there is no easy workaround on the perl
level: feel free to correct this :)

The solution is to downgrade the string to latin1 before converting it
within unpack, or failing if the string cnanot be converted.

Site configuration information for perl v5.8.6:

Configured by Marc Lehmann at Tue Nov 30 00:54:44 CET 2004.

Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
    osname=linux, osvers=2.6.10-rc1, archname=amd64-linux
    uname='linux cerebro 2.6.10-rc1 #1 smp mon nov 22 05:47:21 cet 2004 x86_64 gnulinux '
    config_args='-Duselargefiles -Dxuse64bitint -Uxuse64bitall -Dusemymalloc=y -Dcc=gcc-3.4 -Dccflags=-ggdb -Dcppflags=-D_GNU_SOURCE -I/opt/include -Doptimize=-O4 -march=opteron -mtune=opteron -funroll-loops -fno-strict-aliasing -Dcccdlflags=-fPIC -Dldflags=-L/opt/perl/lib -L/opt/lib -Dlibs=-ldl -lm -lcrypt -Darchname=amd64-linux -Dprefix=/opt/perl -Dprivlib=/opt/perl/lib/perl5 -Darchlib=/opt/perl/lib/perl5 -Dvendorprefix=/opt/perl -Dvendorlib=/opt/perl/lib/perl5 -Dvendorarch=/opt/perl/lib/perl5 -Dsiteprefix=/opt/perl -Dsitelib=/opt/perl/lib/perl5 -Dsitearch=/opt/perl/lib/perl5 -Dsitebin=/opt/perl/bin -Dman1dir=/opt/perl/man/man1 -Dman3dir=/opt/perl/man/man3 -Dsiteman1dir=/opt/perl/man/man1 -Dsiteman3dir=/opt/perl/man/man3 -Dman1ext=1 -Dman3ext=3 -Dpager=/usr/bin/less -Uafs -Uusesfio -Uusenm -Uuseshrplib -Dd_dosuid -Dusethreads=undef -Duse5005threads=undef -Duseithreads=undef -Dusemultiplicity=undef -Dcf_by=Marc Lehmann -Dlocincpth=/opt/perl/include /opt/include -Dmyhostname=localhost -Dmultiarch=undef -Dbin=/opt/perl/bin -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
    cc='gcc-3.4', ccflags ='-ggdb -fno-strict-aliasing -pipe -I/opt/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O4 -march=opteron -mtune=opteron -funroll-loops -fno-strict-aliasing',
    cppflags='-D_GNU_SOURCE -I/opt/include -ggdb -fno-strict-aliasing -pipe -I/opt/include'
    ccversion='', gccversion='3.4.2 (Debian 3.4.2-3)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc-3.4', ldflags ='-L/opt/perl/lib -L/opt/lib -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-ldl -lm -lcrypt
    perllibs=-ldl -lm -lcrypt
    libc=/lib/, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -L/opt/perl/lib -L/opt/lib -L/usr/local/lib'

Locally applied patches:

@INC for perl v5.8.6:

Environment for perl v5.8.6:
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

