develooper Front page | perl.perl5.porters | Postings from September 2000

[ID 20000921.001] utf8-related mangling of 8-bit data

Thread Next
From:
Bart Schuller
Date:
September 21, 2000 01:54
Subject:
[ID 20000921.001] utf8-related mangling of 8-bit data
Message ID:
20000921105406.A22763@tanglefoot.lunatech.com
This is a bug report for perl from schuller@lunatech.com,
generated with the help of perlbug 1.28 running under perl v5.6.0.


-----------------------------------------------------------------
[Please enter your report here]

We've just observed a bug that unfortunately won't reduce to a simple test
case at this moment. It goes something like this:

    my $text = 'static string';
    print STDERR $somevar;
    #
    # $somevar contains two bytes that taken together can be interpreted as
    # one utf8 character. Nowhere in the whole project do we "use utf8", this
    # script should be able to run on 5.005.
    # The print correctly outputs those two bytes (gibberish when viewed as
    # latin1

    $text .= $somevar;
    print STDERR $text;
    #
    # This print magically shows not two but 4 characters with the high bit on.
    # My guess is that they are the utf8-encoded equivalents of the original
    # bytes.

I would love to post an actual piece of code, but currently this only happens
when the module in question runs under mod_perl.
Are there any debugging tools that would for instance allow me to print the
UTF8-ness of the variables in question?

The code can be made to work correctly by putting a "use bytes" on top.
However, string concatenation changing the data is a bug in my book.

Lastly, the reason that this is a perl with ActiveState patches is that we
wanted a 5.6.0 with as few bugs as possible, and 5.6.1 is not out yet. If
5.7.0 would be more suitable, I'll try that.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=high
---
Site configuration information for perl v5.6.0:

Configured by kaas at Tue Sep 19 14:16:32 MET DST 2000.

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
  Platform:
    osname=solaris, osvers=2.6, archname=sun4-solaris
    uname='sunos mrs012gv 5.6 generic_105181-21 sun4u sparc sunw,ultra-1 '
    config_args='-Dprefix=/opt/perl-5.6.0-AP618 -Uusemymalloc -Ubincompat5005 -Dcc=gcc -Dlocincpth=/opt/gdbm-1.8.0/include -Dloclibpth=/opt/gdbm-1.8.0/lib -Di_gdbm -des'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define 
    use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef
  Compiler:
    cc='gcc', optimize='-O', gccversion=2.95.2 19991024 (release)
    cppflags='-fno-strict-aliasing -I/opt/gdbm-1.8.0/include'
    ccflags ='-fno-strict-aliasing -I/opt/gdbm-1.8.0/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    stdchar='unsigned char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/opt/gdbm-1.8.0/lib '
    libpth=/opt/gdbm-1.8.0/lib /lib /usr/lib /usr/ccs/lib
    libs=-lsocket -lnsl -ldl -lm -lc -lcrypt -lsec
    libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-fPIC', lddlflags='-G -L/opt/gdbm-1.8.0/lib'

Locally applied patches:
    ACTIVEPERL_LOCAL_PATCHES_ENTRY

---
@INC for perl v5.6.0:
    /opt/perl-5.6.0-AP618/lib/5.6.0/sun4-solaris
    /opt/perl-5.6.0-AP618/lib/5.6.0
    /opt/perl-5.6.0-AP618/lib/site_perl/5.6.0/sun4-solaris
    /opt/perl-5.6.0-AP618/lib/site_perl/5.6.0
    /opt/perl-5.6.0-AP618/lib/site_perl
    .

---
Environment for perl v5.6.0:
    HOME=/export/home/schuller
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/export/home/schuller/bin:/opt/saxon/bin:/opt/xt/bin:/opt/vim/bin:/opt/sudo/bin:/opt/perl-5.6/bin:/opt/gnu/bin:/opt/openjade/bin:/opt/ghostscript/bin:/opt/apache/ssl/bin:/opt/oracle/bin:/opt/SUNWspro/bin:/opt/FSFrecode/bin:/usr/ccs/bin:/usr/openwin/bin:/usr/xpg4/bin:/bin:/usr/proc/bin:.
    PERL_BADLANG (unset)
    SHELL=/opt/gnu/bin/tcsh


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About