develooper Front page | perl.perl5.porters | Postings from July 2005

[perl #36691] string captured into $1, $2,... in pattern match sometimes have utf8 bit "on" and sometimes "off". Even for the same patternmatch arguments

From:
Andy Maas
Date:
July 29, 2005 22:08
Subject:
[perl #36691] string captured into $1, $2,... in pattern match sometimes have utf8 bit "on" and sometimes "off". Even for the same patternmatch arguments
Message ID:
rt-3.0.11-36691-118498.18.1710750357086@perl.org
# New Ticket Created by  "Andy Maas" 
# Please include the string:  [perl #36691]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=36691 >


This is a bug report for perl from amaas@proofpoint.com,

generated with the help of perlbug 1.35 running under perl v5.8.6.

 

 

-----------------------------------------------------------------

[Please enter your report here]

 

I observed two related problems.

 

1. Strings captured in $1, $2... in pattern match sometimes have its
utf8 flag on and other times off.

   This is true even when pattern match has the same arguments.

   It looks like that behavior depends on certain mode that cause it to
behave one way or the other

 

   I identified one method to trigger it to behave one way

   (setting utf8 flag on on patter match when argument contain 8 bit
data).

   This is by using HTTP::DAV module to access a DAV resource.

 

   I also identified another method to make it behave another way.

   This is by calling "Encode::is_utf8($2)" before pattern match

 

2. The call to "Encode::is_utf8($2)" actually reset utf8 flag of $2
which seems to be odd since the

   call is supposedly doing nothing but reading utf8 flag status of its
argument.

 

This is the script I use for testing:

 

 

use Encode;

use HTTP::DAV;

 

my $data = pack("H*", "d0cf");

 

print
"---------------------------------------------------------------------\n
";

print "Start CASE 1. utf8 bit is not set on \$2\n";

 

$data =~ /^()(.*)/;

print "is_utf8=".(utf8::is_utf8($2)?1:0)."\n";

 

# no warning here

my $v = $2 & "abc";

 

 

print
"\n\n-------------------------------------------------------------------
--\n";

print "Start CASE 2. using HTTP::DAV module caused utf8 bit is set on
\$2\n";

 

# this triggered warning in later code

my $dav = HTTP::DAV->new;

$dav->credentials(-user => "devcenter", -pass => "rocks", -url =>
"http://webdav.silverstream.com/Director/WebDAVService/main");

$dav->open("http://webdav.silverstream.com/Director/WebDAVService/main")
;

 

# uncomment the following line to see how the Encode::is_utf8 clear utf8
bit of $2

#Encode::is_utf8($2);

 

$data =~ /^()(.*)/;

print "is_utf8=".(utf8::is_utf8($2)?1:0)."\n";

 

# warning here

$v = $2 & "abc";

 

 

 

 

 

[Please do not change anything below this line]

-----------------------------------------------------------------

---

Flags:

    category=core

    severity=medium

---

Site configuration information for perl v5.8.6:

 

Configured by marcel at Mon Nov 29 12:06:34 PST 2004.

 

Summary of my perl5 (revision 5 version 8 subversion 6) configuration:

  Platform:

    osname=linux, osvers=2.4.20-28.8smp,
archname=i686-linux-thread-multi

    uname='linux xenon2 2.4.20-28.8smp #1 smp thu dec 18 12:25:21 est
2003 i686 i686 i386 gnulinux '

    config_args='-Dmksymlinks -Dcc=gcc -Dprefix=/tools/x/perl-5.8.6
-Uinstallusrbinperl -Dusethreads -de'

    hint=recommended, useposix=true, d_sigaction=define

    usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define

    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef

    use64bitint=undef use64bitall=undef uselongdouble=undef

    usemymalloc=n, bincompat5005=undef

  Compiler:

    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/include/gdbm',

    optimize='-O2',

    cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS
-fno-strict-aliasing -pipe -I/usr/local/include -I/usr/include/gdbm'

    ccversion='', gccversion='3.3.3', gccosandvers=''

    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234

    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12

    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8

    alignbytes=4, prototype=define

  Linker and Libraries:

    ld='gcc', ldflags =' -L/usr/local/lib'

    libpth=/usr/local/lib /lib /usr/lib

    libs=-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc

    perllibs=-lnsl -ldl -lm -lcrypt -lutil -lpthread -lc

    libc=/lib/libc-2.3.2.so, so=so, useshrplib=false, libperl=libperl.a

    gnulibc_version='2.3.2'

  Dynamic Linking:

    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'

    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

 

Locally applied patches:

 

 

---

@INC for perl v5.8.6:

    /tools/x/perl-5.8.6/lib/5.8.6/i686-linux-thread-multi

    /tools/x/perl-5.8.6/lib/5.8.6

    /tools/x/perl-5.8.6/lib/site_perl/5.8.6/i686-linux-thread-multi

    /tools/x/perl-5.8.6/lib/site_perl/5.8.6

    /tools/x/perl-5.8.6/lib/site_perl

    .

 

---

Environment for perl v5.8.6:

    HOME=/u/amaas

    LANG=en_US.ISO8859-1

    LANGUAGE (unset)

    LD_LIBRARY_PATH (unset)

    LOGDIR (unset)

 
PATH=/tools/x/bin:/usr/bin::/u/amaas/bin:/usr/local/bin:/bin:/usr/bin:/u
sr/X11R6/bin:/u/amaas/software/apache-ant-1.6.1/bin:/u/amaas/bin

    PERL_BADLANG (unset)

    SHELL=/bin/bash




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About