Front page | perl.perl5.porters |
Postings from June 2010
Re: [perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
Thread Previous
|
Thread Next
From:
Chas. Owens
Date:
June 12, 2010 06:15
Subject:
Re: [perl #75680] Certain regex patterns cause fatal errors with valid UTF-8
Message ID:
AANLkTimRfuYOs0j5OPvlBUGQXBiwlvfcj_hux_cCVRXy@mail.gmail.com
As a work around, I suggest you use the \x{} literal escape:
my $e = "B\x{f6}ck";
It seems to work on my OS X machines.
On Fri, Jun 11, 2010 at 15:15, Doug Cook <perlbug-followup@perl.org> wrote:
> # New Ticket Created by Doug Cook
> # Please include the string: [perl #75680]
> # in the subject line of all future correspondence about this issue.
> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=75680 >
>
>
>
>
> This is a bug report for perl from doug@ablegrape.com,
> generated with the help of perlbug 1.39 running under perl v5.8.9.
>
>
> -----------------------------------------------------------------
> My program worked fine under previous versions of Perl on MacOS (prior to Snow Leopard).
>
> Now it dies under 5.8.9, 5.10.0 and 5.12.1, with "Malformed UTF-8 character (fatal)" - but the input data is the same, and is, as far as I can tell, perfectly valid UTF-8.
>
> I've isolated the failure to a test case, included here, which shows a simple expression that works, two (very) slightly more complex expressions that fail, and the original complex expression from my code. As far as I can tell, all of these should work. Oddly, if I add "use encoding 'utf8'" even the simple regex fails.
>
> My best guess is that perhaps for some reason the regex engine is backing up by bytes within my string, and starting in the middle of a character. The string itself is perfectly valid.
>
> #!/usr/bin/perl
>
> use strict vars;
> use utf8;
> binmode STDOUT, ":utf8";
>
> my $e = "Böck";
>
> if (utf8::is_utf8($e)) { print "yep, is UTF8: $e\n"; }
>
> # this succeeds (failed before with use encoding 'utf8', unknown why)
> if ($e=~ m/.*?[x]$/) { print "matched simple\n"; }
> print "success with simple\n";
>
> # these die
> if ($e=~ m/.*?\p{Space}$/i) { print "matched medium\n"; }
> print "success with medium\n";
> if ($e=~ m/.*?[xyz]$/) { print "matched medium\n"; }
> print "success with medium\n";
>
> # the original, full expression.
> if ($e =~ m/(.*?)[,\p{isSpace}]+((?:\p{isAlpha}[\p{isSpace}\.]{1,2})+)\p{isSpace}*$/) { print "matched complex\n"; }
> print "success with complex\n";
>
>
>
> [Please do not change anything below this line]
> -----------------------------------------------------------------
> ---
> Flags:
> category=core
> severity=critical
> ---
> Site configuration information for perl v5.8.9:
>
> Configured by _postfix at Wed Jun 24 00:32:40 PDT 2009.
>
> Summary of my perl5 (revision 5 version 8 subversion 9) configuration:
> Platform:
> osname=darwin, osvers=10.0, archname=darwin-thread-multi-2level
> uname='darwin neige.apple.com 10.0 darwin kernel version 10.0.0d8: tue may 5 19:29:59 pdt 2009; root:xnu-1437.2~2release_i386 i386 '
> config_args='-ds -e -Dprefix=/usr -Dccflags=-g -pipe -Dldflags= -Dman3ext=3pm -Duseithreads -Duseshrplib -Dinc_version_list=none -Dcc=gcc-4.2'
> hint=recommended, useposix=true, d_sigaction=define
> usethreads=define use5005threads=undef useithreads=define usemultiplicity=define
> useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
> use64bitint=define use64bitall=define uselongdouble=undef
> usemymalloc=n, bincompat5005=undef
> Compiler:
> cc='gcc-4.2', ccflags ='-arch i386 -arch ppc -g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include',
> optimize='-Os',
> cppflags='-g -pipe -fno-common -DPERL_DARWIN -fno-strict-aliasing -I/usr/local/include'
> ccversion='', gccversion='4.2.1 (Apple Inc. build 5646)', gccosandvers=''
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
> d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
> ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
> alignbytes=8, prototype=define
> Linker and Libraries:
> ld='gcc-4.2 -mmacosx-version-min=10.6', ldflags ='-arch i386 -arch ppc -L/usr/local/lib'
> libpth=/usr/local/lib /usr/lib
> libs=-ldbm -ldl -lm -lutil -lc
> perllibs=-ldl -lm -lutil -lc
> libc=/usr/lib/libc.dylib, so=dylib, useshrplib=true, libperl=libperl.dylib
> gnulibc_version=''
> Dynamic Linking:
> dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
> cccdlflags=' ', lddlflags='-arch i386 -arch ppc -bundle -undefined dynamic_lookup -L/usr/local/lib'
>
> Locally applied patches:
> /Library/Perl/Updates/<version> comes before system perl directories
> installprivlib and installarchlib points to the Updates directory
> 6576362: fixed 5.8.9 binary compatibility issue: perlio mutex not initialized
>
> ---
> @INC for perl v5.8.9:
> /Library/Perl/Updates/5.8.9
> /System/Library/Perl/5.8.9/darwin-thread-multi-2level
> /System/Library/Perl/5.8.9
> /Library/Perl/5.8.9/darwin-thread-multi-2level
> /Library/Perl/5.8.9
> /Network/Library/Perl/5.8.9/darwin-thread-multi-2level
> /Network/Library/Perl/5.8.9
> /Network/Library/Perl
> /System/Library/Perl/Extras/5.8.9/darwin-thread-multi-2level
> /System/Library/Perl/Extras/5.8.9
> /Library/Perl/5.8.8
> /Library/Perl/5.8.6/darwin-thread-multi-2level
> /Library/Perl/5.8.6
> /Library/Perl/5.8.1
> .
>
> ---
> Environment for perl v5.8.9:
> DYLD_LIBRARY_PATH (unset)
> HOME=/Users/cook
> LANG=en_US.UTF-8
> LANGUAGE (unset)
> LD_LIBRARY_PATH (unset)
> LOGDIR (unset)
> PATH=/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:/opt/subversion/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/mysql/bin:/sw/bin:/Volumes/SEA_DISC/NutchStuff/nutch//my_scripts:/opt/local/bin
> PERL_BADLANG (unset)
> SHELL=/bin/bash
>
>
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
Thread Previous
|
Thread Next