develooper Front page | perl.perl5.porters | Postings from January 2001

qu() exposes utf8 hash key problem

Thread Next
Nicholas Clark
January 20, 2001 14:25
qu() exposes utf8 hash key problem
Message ID:
This is a bug report for perl from,
generated with the help of perlbug 1.33 running under perl v5.7.0.

[Please enter your report here]

using the utf8 representation of codepoints 128-255 as a hash key seems to
produce some undesirable effects.
[I'm using a '£' (pound sterling) as my test character - if this gets stripped
to 7 bit you will see hash '#'. The next hash after this sentence is in
the OS version "2.2.17-rmk1 #9"]

I assume that these occur with substr and utf8 scalars, but they are very
easy to make with the new qu operator

the strings are equal, which (I believe) is correct:

perl -le '$uni = qu(£); $eight = "£"; print $uni eq $eight'

however, interesting things start happening with hash keys:

perl  -MDevel::Peek -le '$a{qu(£)} = "foo"; $a{"£"} = "bar" ; foreach (keys %a) {Dump($_)}'
SV = PVIV(0x20d8690) at 0x20d7e94
  REFCNT = 2
  IV = 168
  PV = 0x20e40a0 "\243"
  CUR = 1
  LEN = 0
SV = PVIV(0x20d86e0) at 0x20e25d0
  REFCNT = 2
  IV = 6770
  PV = 0x20e3eb8 "\302\243"
  CUR = 2
  LEN = 0

I shouldn't get 2 hash entries should I?
[for the FAKE,READONLY SV the hash value is cached in the IV, so you can see
that the two representations have hashed to different numbers]

perl -wle '$a{qu(£)} = "foo"; $a{qw(£)} = "bar" ; foreach (keys %a) {print $_};'
Attempt to free non-existent shared string '£'.

perl -wle '$uni = qu(£); $eight = "£"; $a{$uni} = "foo"; $a{$eight} = "bar"; foreach (keys %a) {print $a{$_}}' 

perl -wle '$uni = qu(£); $eight = "£"; $a{$uni} = "foo"; $a{$eight} = "bar"; foreach (keys %a) {print $_; print $a{$_}}'
Use of uninitialized value in print at -e line 1.

Attempt to free non-existent shared string '£'.

the warnings are explained by:

perl -MDevel::Peek -wle '$uni = qu(£); $eight = "£"; $a{$uni} = "foo"; $a{$eight} = "bar"; foreach (keys %a) {print $_; Dump($_)}'
SV = PVIV(0x20d8690) at 0x20d7e94
  REFCNT = 2
  IV = 168
  PV = 0x20e07e0 "\243"
  CUR = 1
  LEN = 0
SV = PVIV(0x20d86c0) at 0x20e25f8
  REFCNT = 2
  IV = 6770
  PV = 0x20dbd88 "\243"
  CUR = 1
  LEN = 0
Attempt to free non-existent shared string '£'.

*something* is feeling quite happy to mess with a readonly scalar

for information

1: it seems no errors are currently being generated if shared strings remain
   at global destruction time.
2: SvREADONLY_off() is a scary thing. Perl_ck_require uses it indiscriminately
   without force_normal to append ".pm" (would a patch be wanted for that?
   It doesn't affect anything *yet*). I'm guessing something else is doing
   something equally horrible on output.

I guess we need a canonical representation for hash keys which at least
one codepoint in the range 128-255 but none >255. Possibly downgraded to
8 bit. Or possibly upgraded to utf8.

Sorry, I have not patches for the above things.

Nicholas Clark

[Please do not change anything below this line]
Site configuration information for perl v5.7.0:

Configured by nick at Thu Jan 18 19:24:14 GMT 2001.

Summary of my perl5 (revision 5.0 version 7 subversion 0) configuration:
    osname=linux, osvers=2.2.17-rmk1, archname=armv4l-linux
    uname='linux 2.2.17-rmk1 #9 fri dec 8 23:52:12 gmt 2000 armv4l unknown '
    config_args='-Dusedevel -Ubincompat5005 -Uinstallusrbinperl -Dinc_version_list=  -Dinc_version_list_init=0 -Duseperlio -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.95.2 20000220 (Debian GNU/Linux)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=8
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -ldb -ldl -lm -lc -lposix -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lposix -lcrypt -lutil
    libc=/lib/, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:

@INC for perl v5.7.0:

Environment for perl v5.7.0:
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About