Front page | perl.perl5.porters |
Postings from September 2011
[perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set
Thread Previous
|
Thread Next
From:
Father Chrysostomos via RT
Date:
September 26, 2011 13:25
Subject:
[perl #100058] Perl leaves broken UTF-8 in SVs whose UTF8 is set
Message ID:
rt-3.6.HEAD-20526-1317068706-1075.100058-15-0@perl.org
On Mon Sep 26 13:19:50 2011, tom christiansen wrote:
> Remebering how setting $/ to an int ref can cause Perl to erroneously
> leave
> broken Perl strings (malformed UTF-8, etc), I've noticed that you can
> get
> this to happen even more easily than that.
>
> % perl -C0 -le 'print "\xC0\x81"' | perl -CS -nle 'printf
> "U+%v04X\n", $_'
> Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc0)
> in printf at -e line 1, <> line 1.
> U+0000
>
> % perl -C0 -le 'print "\xC1\x81"' | perl -CS -nle 'print for
> length, defined, ord'
> Malformed UTF-8 character (2 bytes, need 1, after start byte 0xc1)
> in ord at -e line 1, <> line 1.
> 1
> 1
> 0
>
> Surely this is an error?? We are actually storing invalid UTF-8
> and yet we are marking it valid:
>
> % perl -C0 -le 'print "\xC1\x81"' | perl -MDevel::Peek -CS -nle
> 'Dump($_)'
> SV = PV(0x3c0250e4) at 0x3c04b084
> REFCNT = 1
> FLAGS = (POK,pPOK,UTF8)
> PV = 0x3c031920 "\301\201"\0Malformed UTF-8 character (2 bytes,
> need 1, after start byte 0xc1) in subroutine entry at -e line 1, <>
> line 1.
> [UTF8 "\x{0}"]
> CUR = 2
> LEN = 80
>
> % perl -C0 -le 'print "bad\xC1\x81stuff"' | perl -MDevel::Peek -CS
> -nle 'Dump($_)'
> SV = PV(0x3c0250e4) at 0x3c04b084
> REFCNT = 1
> FLAGS = (POK,pPOK,UTF8)
> PV = 0x3c031920 "bad\301\201stuff"\0Malformed UTF-8 character (2
> bytes, need 1, after start byte 0xc1) in subroutine entry at -e line
> 1, <> line 1.
> [UTF8 "bad\x{0}stuff"]
> CUR = 10
> LEN = 80
>
> % perl -C0 -le 'print "bad\xC1\x88stuff"' | perl -MDevel::Peek -CS
> -nle 'Dump($_)'
> SV = PV(0x3c0250e4) at 0x3c04b084
> REFCNT = 1
> FLAGS = (POK,pPOK,UTF8)
> PV = 0x3c031920 "bad\301\210stuff"\0Malformed UTF-8 character (2
> bytes, need 1, after start byte 0xc1) in subroutine entry at -e line
> 1, <> line 1.
> [UTF8 "bad\x{0}stuff"]
> CUR = 10
> LEN = 80
>
> The UTF8 flag is on, but that is not UTF8.
>
> I can't see how this isn't a bug, but am willing to be enlightened.
I think it was agreed some time ago that that is a bug. The utf8 layer
should at least check for well-formedness (meaning that it produces a
valid perl scalar), even if it does not check for strict UTF-8 (disallow
certain codepoin(the latter being a matter of controversy).
>
> --tom
>
> Summary of my perl5 (revision 5 version 14 subversion 0)
> configuration:
>
> Platform:
> osname=openbsd, osvers=4.4, archname=OpenBSD.i386-openbsd
> uname='openbsd chthon 4.4 generic#0 i386 '
> config_args='-des'
> hint=recommended, useposix=true, d_sigaction=define
> useithreads=undef, usemultiplicity=undef
> useperlio=define, d_sfio=undef, uselargefiles=define,
> usesocks=undef
> use64bitint=undef, use64bitall=undef, uselongdouble=undef
> usemymalloc=y, bincompat5005=undef
> Compiler:
> cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector
> -I/usr/local/include',
> optimize='-O2',
> cppflags='-fno-strict-aliasing -pipe -fstack-protector
> -I/usr/local/include'
> ccversion='', gccversion='3.3.5 (propolice)',
> gccosandvers='openbsd4.4'
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> d_longlong=define, longlongsize=8, d_longdbl=define,
> longdblsize=12
> ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
> lseeksize=8
> alignbytes=4, prototype=define
> Linker and Libraries:
> ld='cc', ldflags ='-Wl,-E -fstack-protector -L/usr/local/lib'
> libpth=/usr/local/lib /usr/lib
> libs=-lgdbm -lm -lutil -lc
> perllibs=-lm -lutil -lc
> libc=/usr/lib/libc.so.48.0, so=so, useshrplib=false,
> libperl=libperl.a
> gnulibc_version=''
> Dynamic Linking:
> dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
> cccdlflags='-DPIC -fPIC ', lddlflags='-shared -fPIC
> -L/usr/local/lib -fstack-protector'
>
>
> Characteristics of this binary (from libperl):
> Compile-time options: MYMALLOC PERL_DONT_CREATE_GVSV
> PERL_MALLOC_WRAP
> PERL_PRESERVE_IVUV USE_LARGE_FILES USE_PERLIO
> USE_PERL_ATOF
> Built under openbsd
> Compiled at Jun 11 2011 11:48:28
> %ENV:
> PERL_UNICODE="SA"
> @INC:
> /usr/local/lib/perl5/site_perl/5.14.0/OpenBSD.i386-openbsd
> /usr/local/lib/perl5/site_perl/5.14.0
> /usr/local/lib/perl5/5.14.0/OpenBSD.i386-openbsd
> /usr/local/lib/perl5/5.14.0
> /usr/local/lib/perl5/site_perl/5.12.3
> /usr/local/lib/perl5/site_perl/5.11.3
> /usr/local/lib/perl5/site_perl/5.10.1
> /usr/local/lib/perl5/site_perl/5.10.0
> /usr/local/lib/perl5/site_perl/5.8.7
> /usr/local/lib/perl5/site_perl/5.8.0
> /usr/local/lib/perl5/site_perl/5.6.0
> /usr/local/lib/perl5/site_perl/5.005
> /usr/local/lib/perl5/site_perl
> .
Thread Previous
|
Thread Next