develooper Front page | perl.perl5.porters | Postings from June 2008

[perl #55250] utf-8 regex case insensitive character classes mishandle non-utf8 strings

Thread Previous | Thread Next
From:
John Gardiner Myers
Date:
June 4, 2008 01:44
Subject:
[perl #55250] utf-8 regex case insensitive character classes mishandle non-utf8 strings
Message ID:
rt-3.6.HEAD-11257-1212529954-1057.55250-75-0@perl.org
# New Ticket Created by  John Gardiner Myers 
# Please include the string:  [perl #55250]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=55250 >



This is a bug report for perl from jgmyers@proofpoint.com,
generated with the help of perlbug 1.35 running under perl v5.8.8.


-----------------------------------------------------------------
[Please enter your report here]

Regular expressions with case insensitive character classes
incorrectly parse non-utf8 strings as if they are utf8.  This bug
reproduces in both 5.8.8 and 5.10.0.  Test cases follow:

use strict;
use warnings;
use Test::Warn;
use Test::More qw(no_plan);

warnings_are {ok("\xa9" !~ /[\x{400}-\x{4ff}]/i)} [], "no warnings";
warnings_are {ok("\xc0" =~ /^[\x{400}-\x{4ff}\xc0]/i)} [], "no warnings";
warnings_are {ok("\xe0" =~ /^[\x{400}-\x{4ff}\xc0]/i)} [], "no warnings";


This incorrectly produces the output:

ok 1
not ok 2 - no warnings
#   Failed test 'no warnings'
#   in /u/jgmyers/nonutf8.t at line 6.
# found warning: Malformed UTF-8 character (unexpected continuation byte 
0xa9, with no preceding start byte) in pattern match (m//) at 
/u/jgmyers/nonutf8.t line 6.
# didn't expect to find a warning
ok 3
not ok 4 - no warnings
#   Failed test 'no warnings'
#   in /u/jgmyers/nonutf8.t at line 7.
# found warning: Malformed UTF-8 character (unexpected non-continuation 
byte 0x00, immediately after start byte 0xc0) in pattern match (m//) at 
/u/jgmyers/nonutf8.t line 7.
# didn't expect to find a warning
not ok 5
#   Failed test in /u/jgmyers/nonutf8.t at line 8.
not ok 6 - no warnings
#   Failed test 'no warnings'
#   in /u/jgmyers/nonutf8.t at line 8.
# found warning: Malformed UTF-8 character (unexpected non-continuation 
byte 0x00, immediately after start byte 0xe0) in pattern match (m//) at 
/u/jgmyers/nonutf8.t line 8.
# didn't expect to find a warning
1..6
# Looks like you failed 4 tests of 6.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=medium
---
Site configuration information for perl v5.8.8:

Configured by jthaler at Tue May  6 14:16:13 PDT 2008.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.4.21-47.0.1.elsmp, 
archname=i686-linux-thread-multi
    uname='linux xenon3 2.4.21-47.0.1.elsmp #1 smp thu oct 19 11:33:45 
edt 2006 i686 gnulinux '
    config_args='-de -Dprefix=/tools/x2/gcc-4.2.2-pps-5.5/perl-5.8.8 
-Dcc=gcc -Uinstallusrbinperl -Dusethreads 
-Dlibpth=/tools/x2/gcc-4.2.2-pps-5.5/lib /lib /usr/lib 
-Dlocincpth=/tools/x2/gcc-4.2.2-pps-5.5/include 
-Dloclibpth=/tools/x2/gcc-4.2.2-pps-5.5/lib 
-Dcf_email=xtools@proofpoint.com 
-Di_gdbm=/tools/x2/gcc-4.2.2-pps-5.5/gdbm/include -Dusemallocwrap=n'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define 
usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS 
-fno-strict-aliasing -pipe -Wdeclaration-after-statement 
-I/tools/x2/gcc-4.2.2-pps-5.5/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS 
-fno-strict-aliasing -pipe -Wdeclaration-after-statement 
-I/tools/x2/gcc-4.2.2-pps-5.5/include -I/usr/include/gdbm'
    ccversion='', gccversion='4.2.2', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', 
lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/tools/x2/gcc-4.2.2-pps-5.5/lib'
    libpth=/tools/x2/gcc-4.2.2-pps-5.5/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.3.2'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fpic', lddlflags='-shared 
-L/tools/x2/gcc-4.2.2-pps-5.5/lib'

Locally applied patches:


---
@INC for perl v5.8.8:
    /tools/x2/gcc-4.2.2-pps-5.5/perl-5.8.8/lib/5.8.8/i686-linux-thread-multi
    /tools/x2/gcc-4.2.2-pps-5.5/perl-5.8.8/lib/5.8.8
    
/tools/x2/gcc-4.2.2-pps-5.5/perl-5.8.8/lib/site_perl/5.8.8/i686-linux-thread-multi
    /tools/x2/gcc-4.2.2-pps-5.5/perl-5.8.8/lib/site_perl/5.8.8
    /tools/x2/gcc-4.2.2-pps-5.5/perl-5.8.8/lib/site_perl
    .

---
Environment for perl v5.8.8:
    HOME=/u/jgmyers
    LANG=en_US.utf8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    
PATH=/tools/x/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/u/jgmyers/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash



Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About