develooper Front page | perl.perl5.porters | Postings from June 2010

[perl #75680] Certain regex patterns cause fatal errors with valid UTF-8

Thread Next
Doug Cook
June 12, 2010 03:42
[perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
Message ID:
# New Ticket Created by  Doug Cook 
# Please include the string:  [perl #75680]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.39 running under perl v5.8.9.

My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard).

Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.

I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails. 

My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid.


use strict vars;
use utf8;
binmode STDOUT, ":utf8";

my $e = "Böck";

if (utf8::is_utf8($e)) { print "yep, is UTF8: $e\n"; }

# this succeeds (failed before with use encoding 'utf8', unknown why)
if ($e=~ m/.*?[x]$/) { print "matched simple\n"; }
print "success with simple\n";

# these die 
if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; }        
print "success with medium\n";
if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; }
print "success with medium\n";

# the original, full expression.
if ($e =~ m/(.*?)[,\p{isSpace}]+((?:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; }
print "success with complex\n";

[Please do not change anything below this line]
Site configuration information for perl v5.8.9:

Configured by _postfix at Wed Jun 24 00:32:40 PDT 2009.

Summary of my perl5 (revision 5 version 8 subversion 9) configuration:
    osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level
    uname='darwin 10.0 darwin kernel version 10.0.0d8: tue may 5 19:29:59 pdt 2009; root:xnu-1437.2~2release_i386 i386 '
    config_args='-ds -e -Dprefix=/usr -Dccflags=-g  -pipe  -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=define use64bitall=define uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include',
    cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib
    libs=-ldbm -ldl -lm -lutil -lc
    perllibs=-ldl -lm -lutil -lc
    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib'

Locally applied patches:
    /Library/Perl/Updates/<version> comes before system perl directories
    installprivlib and installarchlib points to the Updates directory
    6576362: fixed 5.8.9 binary compatibility issue: perlio mutex not initialized

@INC for perl v5.8.9:

Environment for perl v5.8.9:
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About