develooper Front page | perl.perl5.porters | Postings from April 2001

[ID 20010426.002] Word boundry regex treated differently by 5.6and 5.00503

Thread Next
From:
Bens maintenance account
Date:
April 26, 2001 10:42
Subject:
[ID 20010426.002] Word boundry regex treated differently by 5.6and 5.00503
Message ID:
200104261733.f3QHX7K10625@mail.ocentrix.net
This is a bug report for perl from benwa@ocentrix.com,
generated with the help of perlbug 1.28 running under perl v5.6.0.


-----------------------------------------------------------------
[Please enter your report here]

Hey all,

I noticed something that seems strange.  I ran the following
script on two machines.  One of them running 5.00503 and one
running 5.6 (full details below) and got two different
outputs.

--<script>--
#!/usr/bin/perl

use strict;

my $text = "Charles Bronson";

$text =~ s/\B\w//g;

print "here it is: $text\n\n";
--</script>--

output on 5.6 was:      here it is: Cals Bosn
output on 5.00503 was:  here it is: C B

It appears as though 5.00503 is getting rid of all \w
characters in the string that aren't preceded by a word
boundary(which is what I expected), while 5.6 is removing
every other \w character in each word.

I queried my local Seattle Perl Users Group and a few people
confirmed that their pre 5.6 versions treat the regex
differently.

Colin Meyer writes:
---
output of 5.00405: here it is: C B
5.6.0:             here it is: Cals Bosn
5.6.1:             here it is: Cals Bosn
5.7.1:             here it is: Cals Bosn

The same effect can be seen from:
perl -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'

prior to 5.6.0 versions print:
2
3
4
5
6
7
while post 5.6.0 print:
2
4
6
---

He also suggested I check out this one liner below.

---
running on 5.6.0:
[ben@va01 ben]$ perl -M're debug' -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'
Compiling REx `\B\w'
size 3 first at 1
   1: NBOUND(2)
   2: ALNUM(3)
   3: END(0)
stclass `NBOUND' minlen 1
Matching REx `\B\w' against `abcdefg'
  Setting an EVAL scope, savestack=6
   1 <a> <bcdefg>         |  1:  NBOUND
   1 <a> <bcdefg>         |  2:  ALNUM
   2 <ab> <cdefg>         |  3:  END
Match successful!
2
Matching REx `\B\w' against `cdefg'
  Setting an EVAL scope, savestack=3
   3 <abc> <defg>         |  1:  NBOUND
   3 <abc> <defg>         |  2:  ALNUM
   4 <abcd> <efg>         |  3:  END
Match successful!
4
Matching REx `\B\w' against `efg'
  Setting an EVAL scope, savestack=3
   5 <abcde> <fg>         |  1:  NBOUND
   5 <abcde> <fg>         |  2:  ALNUM
   6 <abcdef> <g>         |  3:  END
Match successful!
6
Matching REx `\B\w' against `g'
Freeing REx: `\B\w'
[ben@va01 ben]$

now running on 5.00503:
[ben@roxanne ben]$ perl -M're debug' -le '$t = "abcdefg"; print pos $t while $t =~ m/\B\w/g'
compiling RE `\B\w'
size 3 first at 1
   1: NBOUND(2)
   2: ALNUM(3)
   3: END(0)
stclass `NBOUND' minlen 1
Matching `\B\w' against `abcdefg'
  Setting an EVAL scope, savestack=6
   1 <a> <bcdefg>         |  1:  NBOUND
   1 <a> <bcdefg>         |  2:  ALNUM
   2 <ab> <cdefg>         |  3:  END
2
Matching `\B\w' against `cdefg'
  Setting an EVAL scope, savestack=3
   2 <ab> <cdefg>         |  1:  NBOUND
   2 <ab> <cdefg>         |  2:  ALNUM
   3 <abc> <defg>         |  3:  END
3
Matching `\B\w' against `defg'
  Setting an EVAL scope, savestack=3
   3 <abc> <defg>         |  1:  NBOUND
   3 <abc> <defg>         |  2:  ALNUM
   4 <abcd> <efg>         |  3:  END
4
Matching `\B\w' against `efg'
  Setting an EVAL scope, savestack=3
   4 <abcd> <efg>         |  1:  NBOUND
   4 <abcd> <efg>         |  2:  ALNUM
   5 <abcde> <fg>         |  3:  END
5
Matching `\B\w' against `fg'
  Setting an EVAL scope, savestack=3
   5 <abcde> <fg>         |  1:  NBOUND
   5 <abcde> <fg>         |  2:  ALNUM
   6 <abcdef> <g>         |  3:  END
6
Matching `\B\w' against `g'
  Setting an EVAL scope, savestack=3
   6 <abcdef> <g>         |  1:  NBOUND
   6 <abcdef> <g>         |  2:  ALNUM
   7 <abcdefg> <>         |  3:  END
7
[ben@roxanne ben]$ 
---

I couldn't see anything obvious in perldelta that would
indicate that the two versions should treat this
differently.  Does anyone know why this might happen?

I case you were wondering I was using this particular regex
to extract peoples initials.  When I moved to a 5.6 version of
perl I noticed the change in behavior.

I have included below the details of the perl 5.00503 install
that I used for testing.

Thanks for your time and effort.

- Ben Burnett

-----------------------------------------------------------------
Summary of my perl5 (5.0 patchlevel 5 subversion 3) configuration:
  Platform:
    osname=linux, osvers=2.2.1-ac1, archname=i386-linux
    uname='linux porky.devel.redhat.com 2.2.1-ac1 #1 smp mon feb 1 17:44:44 est 1999 i686 unknown '
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef useperlio=undef d_sfio=undef
  Compiler:
    cc='cc', optimize='-O2', gccversion=egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
    cppflags='-Dbool=char -DHAS_BOOL -I/usr/local/include'
    ccflags ='-Dbool=char -DHAS_BOOL -I/usr/local/include'
    stdchar='char', d_stdstdio=undef, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
    libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
 
 
Characteristics of this binary (from libperl):
  Built under linux
  Compiled at Apr  6 1999 23:34:07
  @INC:
    /usr/lib/perl5/5.00503/i386-linux
    /usr/lib/perl5/5.00503
    /usr/lib/perl5/site_perl/5.005/i386-linux
    /usr/lib/perl5/site_perl/5.005
    .
-----------------------------------------------------------------

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl v5.6.0:

Configured by prospector at Mon Aug  7 10:58:30 EDT 2000.

Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
  Platform:
    osname=linux, osvers=2.2.5-22smp, archname=i386-linux
    uname='linux porky.devel.redhat.com 2.2.5-22smp #1 smp wed jun 2 09:11:51 edt 1999 i686 unknown '
    config_args='-des -Doptimize=-O2 -march=i386 -mcpu=i686 -Dcc=gcc -Dcccdlflags=-fPIC -Dinstallprefix=/usr -Dprefix=/usr -Darchname=i386-linux -Dd_dosuid -Dd_semctl_semun -Di_db -Di_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Uuselargefiles'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=undef 
    use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef
  Compiler:
    cc='gcc', optimize='-O2 -march=i386 -mcpu=i686', gccversion=2.96 20000731 (experimental)
    cppflags='-fno-strict-aliasing'
    ccflags ='-fno-strict-aliasing'
    stdchar='char', d_stdstdio=define, usevfork=false
    intsize=4, longsize=4, ptrsize=4, doublesize=8
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=4
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -ldl -lm -lc -lcrypt
    libc=/lib/libc-2.1.92.so, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
    

---
@INC for perl v5.6.0:
    /usr/lib/perl5/5.6.0/i386-linux
    /usr/lib/perl5/5.6.0
    /usr/lib/perl5/site_perl/5.6.0/i386-linux
    /usr/lib/perl5/site_perl/5.6.0
    /usr/lib/perl5/site_perl
    .

---
Environment for perl v5.6.0:
    HOME=/home/ben
    LANG=en_US
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/mysql/bin/:/home/ben/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash
h


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About