develooper Front page | perl.perl5.porters | Postings from September 2009

[perl #69414] Case-insensitive utf8 matching problem

Thread Next
From:
Christoph Bussenius
Date:
September 26, 2009 13:48
Subject:
[perl #69414] Case-insensitive utf8 matching problem
Message ID:
rt-3.6.HEAD-21832-1253990653-1551.69414-75-0@perl.org
# New Ticket Created by  Christoph Bussenius 
# Please include the string:  [perl #69414]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=69414 >


This is a bug report for perl from Christoph Bussenius <pepe@cpan.org>,
generated with the help of perlbug 1.35 running under perl v5.8.8.


-----------------------------------------------------------------

If a regular expression is matched case-insensitively against an utf8-upgraded
string, the case matching is usually done correctly with respect to Unicode
case semantics, i.e.

  my $str = "hä; utf8::upgrade($str); $str =~ /HÄ/i

is true.  However I found that

  my $str = "hä; utf8::upgrade($str); $str =~ /Ä/i

is false (only the regexes differ), which I believe to be a bug.

As these tests require that the source-code be latin1-encoded, I made a more
portable version that hex-encodes the literals.  The second test fails
due to the bug.

This has been tested in 5.8.8, 5.10.0 and bleed
663bfafc78cf049036e7391ba11385234dcbe9ed.


use strict;
use warnings;
use Devel::Peek;
use Test::More tests => 3;

my $lower  = pack('H*', '68e4');  # h&auml;
my $upper  = pack('H*', '48c4');  # H&Auml;
my $Auml   = pack('H*', 'c4');    # &Auml;
my $Auml2  = pack('H*', 'c4');    # &Auml;
utf8::upgrade($lower);
utf8::upgrade($Auml2);

warn "h&auml;\n";
Dump($lower);
warn "H&Auml;\n";
Dump($upper);
warn "&Auml;\n";
Dump($Auml);
warn "&Auml; -- upgraded\n";
Dump($Auml2);

# We search for three regexes within the upgraded string "h&auml;", ignoring
# case.

ok($lower =~ /$upper/i, 'the full string in upper-case is found');
ok($lower =~ /$Auml/i,  'the single upper-case umlaut should be found'); # FAILS
ok($lower =~ /$Auml2/i, 'it is found if it is utf8-encoded');


-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl v5.8.8:

Configured by Gentoo at Sat Aug 29 04:16:12 CEST 2009.

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
  Platform:
    osname=linux, osvers=2.6.26.2, archname=i686-linux
    uname='linux cyan 2.6.26.2 #2 preempt sun jul 12 02:27:27 cest 2009 i686 intel(r) pentium(r) m processor 2.00ghz genuineintel gnulinux '
    config_args='-des -Darchname=i686-linux -Dcccdlflags=-fPIC -Dccdlflags=-rdynamic -Dcc=i686-pc-linux-gnu-gcc -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr -Dlocincpth=  -Doptimize=-O2 -fomit-frame-pointer -march=pentium-m -pipe -Duselargefiles -Dd_semctl_semun -Dscriptdir=/usr/bin -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dinstallman1dir=/usr/share/man/man1 -Dinstallman3dir=/usr/share/man/man3 -Dman1ext=1 -Dman3ext=3pm -Dinc_version_list=5.8.0 5.8.0/i686-linux 5.8.2 5.8.2/i686-linux 5.8.4 5.8.4/i686-linux 5.8.5 5.8.5/i686-linux 5.8.6 5.8.6/i686-linux 5.8.7 5.8.7/i686-linux  -Dcf_by=Gentoo -Ud_csh -Dusenm -Di_ndbm -Di_gdbm -Di_db'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='i686-pc-linux-gnu-gcc', ccflags ='-fno-strict-aliasing -pipe -Wdeclaration-after-statement -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -fomit-frame-pointer -march=pentium-m -pipe',
    cppflags='-fno-strict-aliasing -pipe -Wdeclaration-after-statement'
    ccversion='', gccversion='4.4.1', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='i686-pc-linux-gnu-gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lpthread -lnsl -lndbm -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=/lib/libc-2.10.1.so, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version='2.10.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    

---
@INC for perl v5.8.8:
    /home/pepe/root/lib/perl5/site_perl/5.8.8/i686-linux
    /home/pepe/root/lib/perl5/5.8.8/i686-linux
    /home/pepe/root/perl/lib/perl5/site_perl/5.8.8/i686-linux
    /home/pepe/root/perl/lib/perl5/5.8.8/i686-linux
    /home/pepe/root/lib/perl5/site_perl/5.8.8/i686-linux
    /home/pepe/root/lib/perl5/site_perl/5.8.8
    /home/pepe/root/lib/perl5/5.8.8/i686-linux
    /home/pepe/root/lib/perl5/5.8.8
    /home/pepe/root/perl/lib/perl5/site_perl/5.8.8/i686-linux
    /home/pepe/root/perl/lib/perl5/site_perl/5.8.8
    /home/pepe/root/perl/lib/perl5/5.8.8/i686-linux
    /home/pepe/root/perl/lib/perl5/5.8.8
    /home/pepe/root/lib/perl5/site_perl/5.8.8/i686-linux
    /home/pepe/root/lib/perl5/site_perl/5.8.8
    /home/pepe/root/lib/perl5/site_perl
    /home/pepe/root/lib/perl5/5.8.8/i686-linux
    /home/pepe/root/lib/perl5/5.8.8
    /home/pepe/root/lib/perl5
    /home/pepe/root/perl/lib/perl5/site_perl/5.8.8/i686-linux
    /home/pepe/root/perl/lib/perl5/site_perl/5.8.8
    /home/pepe/root/perl/lib/perl5/site_perl
    /home/pepe/root/perl/lib/perl5/5.8.8/i686-linux
    /home/pepe/root/perl/lib/perl5/5.8.8
    /home/pepe/root/perl/lib/perl5/i686-linux
    /home/pepe/root/perl/lib/perl5
    /etc/perl
    /usr/lib/perl5/vendor_perl/5.8.8/i686-linux
    /usr/lib/perl5/vendor_perl/5.8.8
    /usr/lib/perl5/vendor_perl
    /usr/lib/perl5/site_perl/5.8.8/i686-linux
    /usr/lib/perl5/site_perl/5.8.8
    /usr/lib/perl5/site_perl/5.8.7
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/5.8.8/i686-linux
    /usr/lib/perl5/5.8.8
    /usr/local/lib/site_perl
    .

---
Environment for perl v5.8.8:
    HOME=/home/pepe
    LANG=C
    LANGUAGE (unset)
    LC_ALL=en_US.ISO8859-1
    LD_LIBRARY_PATH=/home/pepe/.lib:/home/pepe/root/lib
    LOGDIR (unset)
    PATH=/usr/lib/colorgcc/bin:/home/pepe/root/perl/bin:/home/pepe/root/bin:/home/pepe/.bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/i686-pc-linux-gnu/gcc-bin/4.4.1:/opt/blackdown-jdk-1.4.2.03/bin:/opt/blackdown-jdk-1.4.2.03/jre/bin:/usr/kde/3.5/bin:/usr/qt/3/bin:/usr/games/bin:/usr/local/perl/bin
    PERL5LIB=/home/pepe/root/lib/perl5/site_perl/5.8.8/i686-linux:/home/pepe/root/lib/perl5/5.8.8/i686-linux:/home/pepe/root/perl/lib/perl5/site_perl/5.8.8/i686-linux:/home/pepe/root/perl/lib/perl5/5.8.8/i686-linux:/home/pepe/root/lib/perl5/site_perl/5.8.8:/home/pepe/root/lib/perl5/5.8.8:/home/pepe/root/perl/lib/perl5/site_perl/5.8.8:/home/pepe/root/perl/lib/perl5/5.8.8:/home/pepe/root/lib/perl5/site_perl:/home/pepe/root/lib/perl5:/home/pepe/root/perl/lib/perl5/site_perl:/home/pepe/root/perl/lib/perl5
    PERL_BADLANG (unset)
    SHELL=/bin/zsh


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About