develooper Front page | perl.perl5.porters | Postings from September 2011

[perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set

Thread Previous | Thread Next
From:
Father Chrysostomos via RT
Date:
September 26, 2011 13:25
Subject:
[perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set
Message ID:
rt-3.6.HEAD-20526-1317068706-1075.100058-15-0@perl.org
On Mon Sep 26 13:19:50 2011, tom christiansen wrote:
> Remebering how setting $/ to an int ref can cause Perl to erroneously
> leave
> broken Perl strings (malformed UTF-8, etc), I've noticed that you can
> get
> this to happen even more easily than that.
> 
>     % perl -C0 -le 'print "\xC0\x81"' | perl -CS -nle 'printf
> "U+%v04X\n", $_'
>     Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc0)
> in printf at -e line 1, <> line 1.
>     U+0000
> 
>     % perl -C0 -le 'print "\xC1\x81"' | perl -CS -nle 'print for
> length, defined, ord'
>     Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc1)
> in ord at -e line 1, <> line 1.
>     1
>     1
>     0
> 
> Surely this is an error??  We are actually storing invalid UTF-8
> and yet we are marking it valid:
> 
>     % perl -C0 -le 'print "\xC1\x81"' | perl -MDevel::Peek -CS -nle
> 'Dump($_)'
>     SV = PV(0x3c0250e4) at 0x3c04b084
>       REFCNT = 1
>       FLAGS = (POK,pPOK,UTF8)
>       PV = 0x3c031920 "\301\201"\0Malformed UTF-8 character (2 bytes,
> need 1, after start byte 0xc1) in subroutine entry at -e line 1, <>
> line 1.
>      [UTF8 "\x{0}"]
>       CUR = 2
>       LEN = 80
> 
>     % perl -C0 -le 'print "bad\xC1\x81stuff"' | perl -MDevel::Peek -CS
> -nle 'Dump($_)'
>     SV = PV(0x3c0250e4) at 0x3c04b084
>       REFCNT = 1
>       FLAGS = (POK,pPOK,UTF8)
>       PV = 0x3c031920 "bad\301\201stuff"\0Malformed UTF-8 character (2
> bytes, need 1, after start byte 0xc1) in subroutine entry at -e line
> 1, <> line 1.
>      [UTF8 "bad\x{0}stuff"]
>       CUR = 10
>       LEN = 80
> 
>     % perl -C0 -le 'print "bad\xC1\x88stuff"' | perl -MDevel::Peek -CS
> -nle 'Dump($_)'
>     SV = PV(0x3c0250e4) at 0x3c04b084
>       REFCNT = 1
>       FLAGS = (POK,pPOK,UTF8)
>       PV = 0x3c031920 "bad\301\210stuff"\0Malformed UTF-8 character (2
> bytes, need 1, after start byte 0xc1) in subroutine entry at -e line
> 1, <> line 1.
>      [UTF8 "bad\x{0}stuff"]
>       CUR = 10
>       LEN = 80
> 
> The UTF8 flag is on, but that is not UTF8.
> 
> I can't see how this isn't a bug, but am willing to be enlightened.

I think it was agreed some time ago that that is a bug.  The utf8 layer
should at least check for well-formedness (meaning that it produces a
valid perl scalar), even if it does not check for strict UTF-8 (disallow
certain codepoin(the latter being a matter of controversy).

> 
> --tom
> 
> Summary of my perl5 (revision 5 version 14 subversion 0)
> configuration:
> 
>   Platform:
>     osname=openbsd, osvers=4.4, archname=OpenBSD.i386-openbsd
>     uname='openbsd chthon 4.4 generic#0 i386 '
>     config_args='-des'
>     hint=recommended, useposix=true, d_sigaction=define
>     useithreads=undef, usemultiplicity=undef
>     useperlio=define, d_sfio=undef, uselargefiles=define,
> usesocks=undef
>     use64bitint=undef, use64bitall=undef, uselongdouble=undef
>     usemymalloc=y, bincompat5005=undef
>   Compiler:
>     cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
> -I/usr/local/include',
>     optimize='-O2',
>     cppflags='-fno-strict-aliasing -pipe -fstack-protector
> -I/usr/local/include'
>     ccversion='', gccversion='3.3.5 (propolice)',
> gccosandvers='openbsd4.4'
>     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>     d_longlong=define, longlongsize=8, d_longdbl=define,
> longdblsize=12
>     ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> lseeksize=8
>     alignbytes=4, prototype=define
>   Linker and Libraries:
>     ld='cc', ldflags ='-Wl,-E  -fstack-protector -L/usr/local/lib'
>     libpth=/usr/local/lib /usr/lib
>     libs=-lgdbm -lm -lutil -lc
>     perllibs=-lm -lutil -lc
>     libc=/usr/lib/libc.so.48.0, so=so, useshrplib=false,
> libperl=libperl.a
>     gnulibc_version=''
>   Dynamic Linking:
>     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
>     cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC
> -L/usr/local/lib -fstack-protector'
> 
> 
> Characteristics of this binary (from libperl):
>   Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV
> PERL_MALLOC_WRAP
>                         PERL_PRESERVE_IVUV USE_LARGE_FILES USE_PERLIO
>                         USE_PERL_ATOF
>   Built under openbsd
>   Compiled at Jun 11 2011 11:48:28
>   %ENV:
>     PERL_UNICODE="SA"
>   @INC:
>     /usr/local/lib/perl5/site_perl/5.14.0/OpenBSD.i386-openbsd
>     /usr/local/lib/perl5/site_perl/5.14.0
>     /usr/local/lib/perl5/5.14.0/OpenBSD.i386-openbsd
>     /usr/local/lib/perl5/5.14.0
>     /usr/local/lib/perl5/site_perl/5.12.3
>     /usr/local/lib/perl5/site_perl/5.11.3
>     /usr/local/lib/perl5/site_perl/5.10.1
>     /usr/local/lib/perl5/site_perl/5.10.0
>     /usr/local/lib/perl5/site_perl/5.8.7
>     /usr/local/lib/perl5/site_perl/5.8.0
>     /usr/local/lib/perl5/site_perl/5.6.0
>     /usr/local/lib/perl5/site_perl/5.005
>     /usr/local/lib/perl5/site_perl
>     .




Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About