develooper Front page | perl.perl5.porters | Postings from June 2010

Re: [perl #75680] Certain regex patterns cause fatal errors withvalid UTF-8

Thread Previous
From:
karl williamson
Date:
June 13, 2010 08:23
Subject:
Re: [perl #75680] Certain regex patterns cause fatal errors withvalid UTF-8
Message ID:
4C14F7D1.7030405@khwilliamson.com
Chas. Owens wrote:
> As a work around, I suggest you use the \x{} literal escape:
> 
> my $e = "B\x{f6}ck";
> 
> It seems to work on my OS X machines.

Unfortunately the reason this workaround works is because it avoids 
upgrading $e to utf8.  If you use "B\x{101}ck" instead, the malformed 
remains.  Also, because of an unrelated bug, /i matching will not work 
properly for \x{f6}.
> 
> On Fri, Jun 11, 2010 at 15:15, Doug Cook <perlbug-followup@perl.org> wrote:
>> # New Ticket Created by  Doug Cook
>> # Please include the string:  [perl #75680]
>> # in the subject line of all future correspondence about this issue.
>> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=75680 >
>>
>>
>>
>>
>> This is a bug report for perl from doug@ablegrape.com,
>> generated with the help of perlbug 1.39 running under perl v5.8.9.
>>
>>
>> -----------------------------------------------------------------
>> My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard).
>>
>> Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.
>>
>> I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails.
>>
>> My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid.
>>
>> #!/usr/bin/perl
>>
>> use strict vars;
>> use utf8;
>> binmode STDOUT, ":utf8";
>>
>> my $e = "Böck";
>>
>> if (utf8::is_utf8($e)) { print "yep, is UTF8: $e\n"; }
>>
>> # this succeeds (failed before with use encoding 'utf8', unknown why)
>> if ($e=~ m/.*?[x]$/) { print "matched simple\n"; }
>> print "success with simple\n";
>>
>> # these die
>> if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; }
>> print "success with medium\n";
>> if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; }
>> print "success with medium\n";
>>
>> # the original, full expression.
>> if ($e =~ m/(.*?)[,\p{isSpace}]+((?:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; }
>> print "success with complex\n";
>>
>>
>>
>> [Please do not change anything below this line]
>> -----------------------------------------------------------------
>> ---
>> Flags:
>>    category=core
>>    severity=critical
>> ---
>> Site configuration information for perl v5.8.9:
>>
>> Configured by _postfix at Wed Jun 24 00:32:40 PDT 2009.
>>
>> Summary of my perl5 (revision 5 version 8 subversion 9) configuration:
>>  Platform:
>>    osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level
>>    uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8: tue may 5 19:29:59 pdt 2009; root:xnu-1437.2~2release_i386 i386 '
>>    config_args='-ds -e -Dprefix=/usr -Dccflags=-g  -pipe  -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2'
>>    hint=recommended, useposix=true, d_sigaction=define
>>    usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
>>    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
>>    use64bitint=define use64bitall=define uselongdouble=undef
>>    usemymalloc=n, bincompat5005=undef
>>  Compiler:
>>    cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include',
>>    optimize='-Os',
>>    cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include'
>>    ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers=''
>>    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
>>    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
>>    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
>>    alignbytes=8, prototype=define
>>  Linker and Libraries:
>>    ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib'
>>    libpth=/usr/local/lib /usr/lib
>>    libs=-ldbm -ldl -lm -lutil -lc
>>    perllibs=-ldl -lm -lutil -lc
>>    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
>>    gnulibc_version=''
>>  Dynamic Linking:
>>    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
>>    cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib'
>>
>> Locally applied patches:
>>    /Library/Perl/Updates/<version> comes before system perl directories
>>    installprivlib and installarchlib points to the Updates directory
>>    6576362: fixed 5.8.9 binary compatibility issue: perlio mutex not initialized
>>
>> ---
>> @INC for perl v5.8.9:
>>    /Library/Perl/Updates/5.8.9
>>    /System/Library/Perl/5.8.9/darwin-thread-multi-2level
>>    /System/Library/Perl/5.8.9
>>    /Library/Perl/5.8.9/darwin-thread-multi-2level
>>    /Library/Perl/5.8.9
>>    /Network/Library/Perl/5.8.9/darwin-thread-multi-2level
>>    /Network/Library/Perl/5.8.9
>>    /Network/Library/Perl
>>    /System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level
>>    /System/Library/Perl/Extras/5.8.9
>>    /Library/Perl/5.8.8
>>    /Library/Perl/5.8.6/darwin-thread-multi-2level
>>    /Library/Perl/5.8.6
>>    /Library/Perl/5.8.1
>>    .
>>
>> ---
>> Environment for perl v5.8.9:
>>    DYLD_LIBRARY_PATH (unset)
>>    HOME=/Users/cook
>>    LANG=en_US.UTF-8
>>    LANGUAGE (unset)
>>    LD_LIBRARY_PATH (unset)
>>    LOGDIR (unset)
>>    PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:/opt/subversion/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/mysql/bin:/sw/bin:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts:/opt/local/bin
>>    PERL_BADLANG (unset)
>>    SHELL=/bin/bash
>>
>>
> 
> 
> 


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About