Front page | perl.perl5.porters |
Postings from April 2001
[ID 20010426.002] Word boundry regex treated differently by 5.6and 5.00503
Thread Next
From:
Bens maintenance account
Date:
April 26, 2001 10:42
Subject:
[ID 20010426.002] Word boundry regex treated differently by 5.6and 5.00503
Message ID:
200104261733.f3QHX7K10625@mail.ocentrix.net
This is a bug report for perl from benwa@ocentrix.com,
generated with the help of perlbug 1.28 running under perl v5.6.0.
-----------------------------------------------------------------
[Please enter your report here]
Hey all,
I noticed something that seems strange. I ran the following
script on two machines. One of them running 5.00503 and one
running 5.6 (full details below) and got two different
outputs.
--<script>--
#!/usr/bin/perl
use strict;
my $text = "Charles Bronson";
$text =~ s/\B\w//g;
print "here it is: $text\n\n";
--</script>--
output on 5.6 was: here it is: Cals Bosn
output on 5.00503 was: here it is: C B
It appears as though 5.00503 is getting rid of all \w
characters in the string that aren't preceded by a word
boundary(which is what I expected), while 5.6 is removing
every other \w character in each word.
I queried my local Seattle Perl Users Group and a few people
confirmed that their pre 5.6 versions treat the regex
differently.
Colin Meyer writes:
---
output of 5.00405: here it is: C B
5.6.0: here it is: Cals Bosn
5.6.1: here it is: Cals Bosn
5.7.1: here it is: Cals Bosn
The same effect can be seen from:
perl -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'
prior to 5.6.0 versions print:
2
3
4
5
6
7
while post 5.6.0 print:
2
4
6
---
He also suggested I check out this one liner below.
---
running on 5.6.0:
[ben@va01 ben]$ perl -M're debug' -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'
Compiling REx `\B\w'
size 3 first at 1
1: NBOUND(2)
2: ALNUM(3)
3: END(0)
stclass `NBOUND' minlen 1
Matching REx `\B\w' against `abcdefg'
Setting an EVAL scope, savestack=6
1 <a> <bcdefg> | 1: NBOUND
1 <a> <bcdefg> | 2: ALNUM
2 <ab> <cdefg> | 3: END
Match successful!
2
Matching REx `\B\w' against `cdefg'
Setting an EVAL scope, savestack=3
3 <abc> <defg> | 1: NBOUND
3 <abc> <defg> | 2: ALNUM
4 <abcd> <efg> | 3: END
Match successful!
4
Matching REx `\B\w' against `efg'
Setting an EVAL scope, savestack=3
5 <abcde> <fg> | 1: NBOUND
5 <abcde> <fg> | 2: ALNUM
6 <abcdef> <g> | 3: END
Match successful!
6
Matching REx `\B\w' against `g'
Freeing REx: `\B\w'
[ben@va01 ben]$
now running on 5.00503:
[ben@roxanne ben]$ perl -M're debug' -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'
compiling RE `\B\w'
size 3 first at 1
1: NBOUND(2)
2: ALNUM(3)
3: END(0)
stclass `NBOUND' minlen 1
Matching `\B\w' against `abcdefg'
Setting an EVAL scope, savestack=6
1 <a> <bcdefg> | 1: NBOUND
1 <a> <bcdefg> | 2: ALNUM
2 <ab> <cdefg> | 3: END
2
Matching `\B\w' against `cdefg'
Setting an EVAL scope, savestack=3
2 <ab> <cdefg> | 1: NBOUND
2 <ab> <cdefg> | 2: ALNUM
3 <abc> <defg> | 3: END
3
Matching `\B\w' against `defg'
Setting an EVAL scope, savestack=3
3 <abc> <defg> | 1: NBOUND
3 <abc> <defg> | 2: ALNUM
4 <abcd> <efg> | 3: END
4
Matching `\B\w' against `efg'
Setting an EVAL scope, savestack=3
4 <abcd> <efg> | 1: NBOUND
4 <abcd> <efg> | 2: ALNUM
5 <abcde> <fg> | 3: END
5
Matching `\B\w' against `fg'
Setting an EVAL scope, savestack=3
5 <abcde> <fg> | 1: NBOUND
5 <abcde> <fg> | 2: ALNUM
6 <abcdef> <g> | 3: END
6
Matching `\B\w' against `g'
Setting an EVAL scope, savestack=3
6 <abcdef> <g> | 1: NBOUND
6 <abcdef> <g> | 2: ALNUM
7 <abcdefg> <> | 3: END
7
[ben@roxanne ben]$
---
I couldn't see anything obvious in perldelta that would
indicate that the two versions should treat this
differently. Does anyone know why this might happen?
I case you were wondering I was using this particular regex
to extract peoples initials. When I moved to a 5.6 version of
perl I noticed the change in behavior.
I have included below the details of the perl 5.00503 install
that I used for testing.
Thanks for your time and effort.
- Ben Burnett
-----------------------------------------------------------------
Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
Platform:
osname=linux, osvers=2.2.1-ac1, archname=i386-linux
uname='linux porky.devel.redhat.com 2.2.1-ac1 #1 smp mon feb 1 17:44:44 est 1999 i686 unknown '
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef useperlio=undef d_sfio=undef
Compiler:
cc='cc', optimize='-O2', gccversion=egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'
ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'
stdchar='char', d_stdstdio=undef, usevfork=false
intsize=4, longsize=4, ptrsize=4, doublesize=8
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
alignbytes=4, usemymalloc=n, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
libc=, so=so, useshrplib=false, libperl=libperl.a
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Characteristics of this binary (from libperl):
Built under linux
Compiled at Apr 6 1999 23:34:07
@INC:
/usr/lib/perl5/5.00503/i386-linux
/usr/lib/perl5/5.00503
/usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005
.
-----------------------------------------------------------------
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=low
---
Site configuration information for perl v5.6.0:
Configured by prospector at Mon Aug 7 10:58:30 EDT 2000.
Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
Platform:
osname=linux, osvers=2.2.5-22smp, archname=i386-linux
uname='linux porky.devel.redhat.com 2.2.5-22smp #1 smp wed jun 2 09:11:51 edt 1999 i686 unknown '
config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dcc=gcc -Dcccdlflags=-fPIC -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dd_dosuid -Dd_semctl_semun -Di_db -Di_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Uuselargefiles'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=undef d_sfio=undef uselargefiles=undef
use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef
Compiler:
cc='gcc', optimize='-O2 -march=i386 -mcpu=i686', gccversion=2.96 20000731 (experimental)
cppflags='-fno-strict-aliasing'
ccflags ='-fno-strict-aliasing'
stdchar='char', d_stdstdio=define, usevfork=false
intsize=4, longsize=4, ptrsize=4, doublesize=8
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
alignbytes=4, usemymalloc=n, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -ldl -lm -lc -lcrypt
libc=/lib/libc-2.1.92.so, so=so, useshrplib=false, libperl=libperl.a
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.6.0:
/usr/lib/perl5/5.6.0/i386-linux
/usr/lib/perl5/5.6.0
/usr/lib/perl5/site_perl/5.6.0/i386-linux
/usr/lib/perl5/site_perl/5.6.0
/usr/lib/perl5/site_perl
.
---
Environment for perl v5.6.0:
HOME=/home/ben
LANG=en_US
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/mysql/bin/:/home/ben/bin
PERL_BADLANG (unset)
SHELL=/bin/bash
h
Thread Next
-
[ID 20010426.002] Word boundry regex treated differently by 5.6and 5.00503
by Bens maintenance account