Front page | perl.perl5.porters |
Postings from June 2010
[perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
Thread Next
From:
Doug Cook
Date:
June 12, 2010 03:42
Subject:
[perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
Message ID:
rt-3.6.HEAD-4976-1276283723-1792.75680-75-0@perl.org
# New Ticket Created by Doug Cook
# Please include the string: [perl #75680]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=75680 >
This is a bug report for perl from doug@ablegrape.com,
generated with the help of perlbug 1.39 running under perl v5.8.9.
-----------------------------------------------------------------
My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard).
Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.
I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails.
My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid.
#!/usr/bin/perl
use strict vars;
use utf8;
binmode STDOUT, ":utf8";
my $e = "Böck";
if (utf8::is_utf8($e)) { print "yep, is UTF8: $e\n"; }
# this succeeds (failed before with use encoding 'utf8', unknown why)
if ($e=~ m/.*?[x]$/) { print "matched simple\n"; }
print "success with simple\n";
# these die
if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; }
print "success with medium\n";
if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; }
print "success with medium\n";
# the original, full expression.
if ($e =~ m/(.*?)[,\p{isSpace}]+((?:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; }
print "success with complex\n";
[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=critical
---
Site configuration information for perl v5.8.9:
Configured by _postfix at Wed Jun 24 00:32:40 PDT 2009.
Summary of my perl5 (revision 5 version 8 subversion 9) configuration:
Platform:
osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level
uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8: tue may 5 19:29:59 pdt 2009; root:xnu-1437.2~2release_i386 i386 '
config_args='-ds -e -Dprefix=/usr -Dccflags=-g -pipe -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=define uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include',
optimize='-Os',
cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib
libs=-ldbm -ldl -lm -lutil -lc
perllibs=-ldl -lm -lutil -lc
libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib'
Locally applied patches:
/Library/Perl/Updates/<version> comes before system perl directories
installprivlib and installarchlib points to the Updates directory
6576362: fixed 5.8.9 binary compatibility issue: perlio mutex not initialized
---
@INC for perl v5.8.9:
/Library/Perl/Updates/5.8.9
/System/Library/Perl/5.8.9/darwin-thread-multi-2level
/System/Library/Perl/5.8.9
/Library/Perl/5.8.9/darwin-thread-multi-2level
/Library/Perl/5.8.9
/Network/Library/Perl/5.8.9/darwin-thread-multi-2level
/Network/Library/Perl/5.8.9
/Network/Library/Perl
/System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.9
/Library/Perl/5.8.8
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6
/Library/Perl/5.8.1
.
---
Environment for perl v5.8.9:
DYLD_LIBRARY_PATH (unset)
HOME=/Users/cook
LANG=en_US.UTF-8
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:/opt/subversion/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/mysql/bin:/sw/bin:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts:/opt/local/bin
PERL_BADLANG (unset)
SHELL=/bin/bash
Thread Next
-
[perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
by Doug Cook