develooper Front page | perl.perl5.porters | Postings from September 2011

[perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set

Thread Previous
From:
tchrist1
Date:
September 26, 2011 13:20
Subject:
[perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set
Message ID:
rt-3.6.HEAD-20526-1317068391-1727.100058-75-0@perl.org
# New Ticket Created by  tchrist1 
# Please include the string:  [perl #100058]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=100058 >


Remebering how setting $/ to an int ref can cause Perl to erroneously leave
broken Perl strings (malformed UTF-8, etc), I've noticed that you can get
this to happen even more easily than that.

    % perl -C0 -le 'print "\xC0\x81"' | perl -CS -nle 'printf "U+%v04X\n", $_'
    Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc0) in printf at -e line 1, <> line 1.
    U+0000

    % perl -C0 -le 'print "\xC1\x81"' | perl -CS -nle 'print for length, defined, ord'
    Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc1) in ord at -e line 1, <> line 1.
    1
    1
    0

Surely this is an error??  We are actually storing invalid UTF-8 
and yet we are marking it valid:

    % perl -C0 -le 'print "\xC1\x81"' | perl -MDevel::Peek -CS -nle 'Dump($_)'
    SV = PV(0x3c0250e4) at 0x3c04b084
      REFCNT = 1
      FLAGS = (POK,pPOK,UTF8)
      PV = 0x3c031920 "\301\201"\0Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc1) in subroutine entry at -e line 1, <> line 1.
     [UTF8 "\x{0}"]
      CUR = 2
      LEN = 80

    % perl -C0 -le 'print "bad\xC1\x81stuff"' | perl -MDevel::Peek -CS -nle 'Dump($_)'
    SV = PV(0x3c0250e4) at 0x3c04b084
      REFCNT = 1
      FLAGS = (POK,pPOK,UTF8)
      PV = 0x3c031920 "bad\301\201stuff"\0Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc1) in subroutine entry at -e line 1, <> line 1.
     [UTF8 "bad\x{0}stuff"]
      CUR = 10
      LEN = 80

    % perl -C0 -le 'print "bad\xC1\x88stuff"' | perl -MDevel::Peek -CS -nle 'Dump($_)'
    SV = PV(0x3c0250e4) at 0x3c04b084
      REFCNT = 1
      FLAGS = (POK,pPOK,UTF8)
      PV = 0x3c031920 "bad\301\210stuff"\0Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc1) in subroutine entry at -e line 1, <> line 1.
     [UTF8 "bad\x{0}stuff"]
      CUR = 10
      LEN = 80

The UTF8 flag is on, but that is not UTF8.

I can't see how this isn't a bug, but am willing to be enlightened.

--tom

Summary of my perl5 (revision 5 version 14 subversion 0) configuration:
   
  Platform:
    osname=openbsd, osvers=4.4, archname=OpenBSD.i386-openbsd
    uname='openbsd chthon 4.4 generic#0 i386 '
    config_args='-des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=y, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='3.3.5 (propolice)', gccosandvers='openbsd4.4'
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-lgdbm -lm -lutil -lc
    perllibs=-lm -lutil -lc
    libc=/usr/lib/libc.so.48.0, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC  -L/usr/local/lib -fstack-protector'


Characteristics of this binary (from libperl): 
  Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP
                        PERL_PRESERVE_IVUV USE_LARGE_FILES USE_PERLIO
                        USE_PERL_ATOF
  Built under openbsd
  Compiled at Jun 11 2011 11:48:28
  %ENV:
    PERL_UNICODE="SA"
  @INC:
    /usr/local/lib/perl5/site_perl/5.14.0/OpenBSD.i386-openbsd
    /usr/local/lib/perl5/site_perl/5.14.0
    /usr/local/lib/perl5/5.14.0/OpenBSD.i386-openbsd
    /usr/local/lib/perl5/5.14.0
    /usr/local/lib/perl5/site_perl/5.12.3
    /usr/local/lib/perl5/site_perl/5.11.3
    /usr/local/lib/perl5/site_perl/5.10.1
    /usr/local/lib/perl5/site_perl/5.10.0
    /usr/local/lib/perl5/site_perl/5.8.7
    /usr/local/lib/perl5/site_perl/5.8.0
    /usr/local/lib/perl5/site_perl/5.6.0
    /usr/local/lib/perl5/site_perl/5.005
    /usr/local/lib/perl5/site_perl
    .


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About