develooper Front page | perl.perl5.porters | Postings from April 2003

[perl #21951] /(^\s]+)/ regexp can incorrectly fail match in utf8 locale

From:
perlbug-followup
Date:
April 13, 2003 19:41
Subject:
[perl #21951] /(^\s]+)/ regexp can incorrectly fail match in utf8 locale
Message ID:
rt-21951-55074.17.1021460832976@bugs6.perl.org
# New Ticket Created by  bbaetz@acm.org 
# Please include the string:  [perl #21951]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt2/Ticket/Display.html?id=21951 >



This is a bug report for perl from bbaetz@acm.org,
generated with the help of perlbug 1.34 running under perl v5.8.0.


-----------------------------------------------------------------
[Please enter your report here]

Given:

use CGI qw(header);
print header(-location=>'X');

With a locale of en_AU.UTF-8, I get:

Location="X"
Content-Type: text/html; charset=ISO-8859-1
 
With a LANG of en_AU, I get:

Location: X
Content-Type: text/html; charset=ISO-8859-1

The : is correct; the = isn't.

This is because inside the |header| sub, there is:

    # rearrange() was designed for the HTML portion, so we
    # need to fix it up a little.
    foreach (@other) {
        next unless my($header,$value) = /([^\s=]+)=\"?(.+?)\"?$/;
        ($_ = $header) =~ s/^(\w)(.*)/"\u$1\L$2" . ': '.$self->unescapeHTML($value)/e;
    }

and the first regexp isn't matching when given |location="X"|.

In the perl debugger, if I break just inside that loop, I get:

  DB<1> p $_;
location="X"
  DB<2> p $_ =~ /\s/;
 
  DB<3> x $_ =~ /(.*)/;
0  'location="X"'
  DB<4> x $_ =~ /(\s)/;
  empty array
  DB<5> x $_ =~ /([^\s])/;
0  'l'
  DB<6> x $_ =~ /([^\s]+)/;
  empty array
  DB<7> x $_ =~ /([^\s][^\s])/;
0  'lo'
  DB<8> x $_ =~ /([^\s]*)/;
0  ''
  DB<9> x 'location="X"' =~ /([^\s]*)/;
0  'location="X"'
  DB<10> x 'location="X"' =~ /([^\s]+)/;
0  'location="X"'
  DB<11> x $_ =~ /([^\s]+)/;
  empty array

Comparing 6 vs 7, it matches two adjacent nonspace chars, but not one or
more (via +), so something is definately wrong.

I failed to come up with a self containted testcase though, or a 
workarround.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=high
---
Site configuration information for perl v5.8.0:

Configured by bhcompile'
cf_email='bhcompile at Tue Feb 18 22:17:47 EST 2003.

Summary of my rderl (revision 5.0 version 8 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.4.20-2.48smp, archname=i386-linux-thread-multi
    uname='linux str'
    config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -g -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dvendorprefix=/usr -Dsiteprefix=/usr -Dotherlibdirs=/usr/lib/perl5/5.8.0 -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef'
 useithreads=define usemultiplicity=
    useperlio= d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=un uselongdouble=
    usemymalloc=, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',
    optimize='',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBUGGING -fno-strict-aliasing -I/usr/local/include -I/usr/include/gdbm'
    ccversion='', gccversion='3.2.2 20030213 (Red Hat Linux 8.0 3.2.2-1)', gccosandvers=''
gccversion='3.2.2 200302'
    intsize=e, longsize= , ptrsize=p, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long'
k', ivsize=4'
ivtype='long'
known_ext, nvtype='double'
o_nonbl', nvsize=, Off_t='', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='gcc'
l', ldflags =' -L/usr/local/lib'
ldf'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lgdbm -ldb -ldl -lm -lpthread -lc -lcrypt -lutil
    perllibs=
    libc=/lib/libc-2.3.1.so, so=so, useshrplib=true, libperl=libper
    gnulibc_version='2.3.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so', d_dlsymun=undef, ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5/5.8.0/i386-linux-thread-multi/CORE'
    cccdlflags='-fPIC'
ccdlflags='-rdynamic -Wl,-rpath,/usr/lib/perl5', lddlflags='s Unicode/Normalize XS/A'

Locally applied patches:
    MAINT18379

---
@INC for perl v5.8.0:
    /usr/lib/perl5/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/5.8.0
    /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/site_perl/5.8.0
    /usr/lib/perl5/site_perl
    /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/vendor_perl/5.8.0
    /usr/lib/perl5/vendor_perl
    /usr/lib/perl5/5.8.0/i386-linux-thread-multi
    /usr/lib/perl5/5.8.0
    .

---
Environment for perl v5.8.0:
    HOME=/home/bbaetz
    LANG=en_AU.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/bbaetz/swtest/bin:/home/bbaetz/bin:/home/bbaetz/swtest/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/usr/local/wordnet1.6/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash
    dlflags='-share (unset)




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About