Front page | perl.perl5.porters |
Postings from July 2000
[ID 20000731.001] regex optimizer problems with utf8 and (??{ ... })
From:
Jeffrey Friedl
Date:
July 31, 2000 11:14
Subject:
[ID 20000731.001] regex optimizer problems with utf8 and (??{ ... })
Message ID:
200007310746.AAA21772@ventrue.yahoo.com
This is a bug report for perl from jfriedl@yahoo-inc.com,
generated with the help of perlbug 1.28 running under perl v5.6.0.
-----------------------------------------------------------------
[Please enter your report here]
Hiho,
I think I've found a place where the regex optimizer is rejecting a match
that it shouldn't.
I would expect that the program:
#!/usr/local/bin/perl -w
use re 'debug';
use strict;
use utf8;
$_ = "A \x{263a} B z C";
if (m/A . B (??{ "z" }) C/) {
print "match\n";
} else {
print "no match\n";
}
would print that there was a match.
Here's what I'm getting (when piped through something to show non-ASCII
bytes as {FF}):
% utf8-5
Compiling REx `A . B (??{ "z" }) C'
size 11 first at 1
synthetic stclass `ANYOF[A]'.
1: EXACT <A >(3)
3: ANYUTF8(4)
4: EXACT < B >(6)
6: LOGICAL[2](7)
7: EVAL(9)
9: EXACT < C>(11)
11: END(0)
anchored ` B ' at 3 floating ` C' at 6..2147483647 (checking anchored) stclass `ANYOF[A]' minlen 8 with eval
Guessing start of match, REx `A . B (??{ "z" }) C' against `A {e2}{98}{ba} B z C'...
Found anchored substr ` B ' at offset 5...
Found floating substr ` C' at offset 9...
This position contradicts STCLASS...
Trying anchored substr starting at offset 8...
Did not find anchored substr ` B '...
Match rejected by optimizer
Freeing REx: `A . B (??{ "z" }) C'
no match
The {e2}{98}{ba} is the proper UTF-8 for the single smiley character.
Jeffrey
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl v5.6.0:
Configured by jfriedl at Sat Jul 29 20:09:33 PDT 2000.
Summary of my perl5 (revision 5.0 version 6 subversion 0) configuration:
Platform:
osname=linux, osvers=2.2.15, archname=i686-linux
uname='linux fummy.dsl.yahoo.com 2.2.16 #6 smp sun jul 23 11:26:16 pdt 2000 i686 unknown '
config_args='-ds -e -A optimize=-g'
hint=previous, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=undef d_sfio=undef uselargefiles=define
use64bitint=undef use64bitall=undef uselongdouble=undef usesocks=undef
Compiler:
cc='cc', optimize='-O2 -g', gccversion=pgcc-2.91.66 19990314 (egcs-1.1.2 release)
cppflags='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
ccflags ='-fno-strict-aliasing -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
stdchar='char', d_stdstdio=define, usevfork=false
intsize=4, longsize=4, ptrsize=4, doublesize=8
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=4, usemymalloc=n, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -lndbm -lgdbm -ldb -ldl -lm -lc -lposix -lcrypt
libc=/lib/libc-2.1.1.so, so=so, useshrplib=false, libperl=libperl.a
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.6.0:
/home/jfriedl/lib/perl
/home/jfriedl/lib/perl/yahoo
/usr/local/lib/perl5/5.6.0/i686-linux
/usr/local/lib/perl5/5.6.0
/usr/local/lib/perl5/site_perl/5.6.0/i686-linux
/usr/local/lib/perl5/site_perl/5.6.0
/usr/local/lib/perl5/site_perl
.
---
Environment for perl v5.6.0:
HOME=/home/jfriedl
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH=/usr/local/pgsql/lib:/home/jfriedl/src/rvplayer5.0
LOGDIR (unset)
PATH=/home/jfriedl/bin:/home/jfriedl/common/bin:/usr/local/gcc-2.95.2/bin:.:/usr/local/pgsql/bin:/usr/local/bin:/usr/X11R6/bin:/bin:/usr/bin:/usr/sbin:/sbin:/home/jfriedl/src/rvplayer5.0
PERLLIB=/home/jfriedl/lib/perl:/home/jfriedl/lib/perl/yahoo
PERL_BADLANG (unset)
SHELL=/bin/tcsh
-
[ID 20000731.001] regex optimizer problems with utf8 and (??{ ... })
by Jeffrey Friedl