develooper Front page | perl.perl5.porters | Postings from September 2000

[ID 20000921.001] utf8-related mangling of 8-bit data

Thread Next
Bart Schuller
September 21, 2000 01:54
[ID 20000921.001] utf8-related mangling of 8-bit data
Message ID:
This is a bug report for perl from,
generated with the help of perlbug 1.28 running under perl v5.6.0.

[Please enter your report here]

We've just observed a bug that unfortunately won't reduce to a simple test
case at this moment. It goes something like this:

    my $text = 'static string';
    print STDERR $somevar;
    # $somevar contains two bytes that taken together can be interpreted as
    # one utf8 character. Nowhere in the whole project do we "use utf8", this
    # script should be able to run on 5.005.
    # The print correctly outputs those two bytes (gibberish when viewed as
    # latin1

    $text .= $somevar;
    print STDERR $text;
    # This print magically shows not two but 4 characters with the high bit on.
    # My guess is that they are the utf8-encoded equivalents of the original
    # bytes.

I would love to post an actual piece of code, but currently this only happens
when the module in question runs under mod_perl.
Are there any debugging tools that would for instance allow me to print the
UTF8-ness of the variables in question?

The code can be made to work correctly by putting a "use bytes" on top.
However, string concatenation changing the data is a bug in my book.

Lastly, the reason that this is a perl with ActiveState patches is that we
wanted a 5.6.0 with as few bugs as possible, and 5.6.1 is not out yet. If
5.7.0 would be more suitable, I'll try that.

[Please do not change anything below this line]
Site configuration information for perl v5.6.0:

Configured by kaas at Tue Sep 19 14:16:32 MET DST 2000.

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
    osname=solaris, osvers=2.6, archname=sun4-solaris
    uname='sunos mrs012gv 5.6 generic_105181-21 sun4u sparc sunw,ultra-1 '
    config_args='-Dprefix=/opt/perl-5.6.0-AP618 -Uusemymalloc -Ubincompat5005 -Dcc=gcc -Dlocincpth=/opt/gdbm-1.8.0/include -Dloclibpth=/opt/gdbm-1.8.0/lib -Di_gdbm -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define 
    use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef
    cc='gcc', optimize='-O', gccversion=2.95.2 19991024 (release)
    cppflags='-fno-strict-aliasing -I/opt/gdbm-1.8.0/include'
    ccflags ='-fno-strict-aliasing -I/opt/gdbm-1.8.0/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    stdchar='unsigned char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/opt/gdbm-1.8.0/lib '
    libpth=/opt/gdbm-1.8.0/lib /lib /usr/lib /usr/ccs/lib
    libs=-lsocket -lnsl -ldl -lm -lc -lcrypt -lsec
    libc=/lib/, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-fPIC', lddlflags='-G -L/opt/gdbm-1.8.0/lib'

Locally applied patches:

@INC for perl v5.6.0:

Environment for perl v5.6.0:
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About