develooper Front page | perl.perl5.porters | Postings from October 2016

[perl #129950] Some UTF-8 regular expression matches fail when readfrom file

Thread Next
From:
Tony Cook via RT
Date:
October 24, 2016 22:57
Subject:
[perl #129950] Some UTF-8 regular expression matches fail when readfrom file
Message ID:
rt-4.0.24-15532-1477349835-1561.129950-15-0@perl.org
On Sun Oct 23 21:48:55 2016, manabe.hiroshi@gmail.com wrote:
> On 2016-10月-23 日 21:44:35, manabe.hiroshi@gmail.com wrote:
> > On 2016-10月-23 日 21:23:20, manabe.hiroshi@gmail.com wrote:
> > > You can reproduc the bug with the following procedure:
> > > 1. perl -CO -e 'print "a\x{e4}";' > foo.txt # aä
> > > 2. perl -CI -e 'open IN, "<", "foo.txt"; $_ = <IN>; print
> > > m{^a|a\x{e4}$} . "matched\n" : "not matched\n";
> > > Output: not matched
> > >
> > > This happenes only when the string is read from a file handle and the
> > > second character is in the range of \x{80}-\x{ff}.
> > > Curiously enough, the match succeeds if the regexp is m{^a|a[\x{e3}-
> > > \x{e4}]$} or m{^a|a[\x{e4}-\x{e5}]$}, but not if it is m{^a|a[\x{e4}-
> > > \x{e4}]$}.
> > 
> > Sorry, the bug only reproduces itself when there is a set of
> > parenthes, i.e. m{^(a|a\x{e4})$} etc.
> 
> Sorry again, the correct unicode option for the step 2 was -Ci.

The string doesn't need to be from a file:

$ ./perl -e '$_ = "a\xE4"; utf8::upgrade($_); print m{^(a|a\x{e4})$} ? "matched\n" : "not matched\n";'
not matched

(blead perl)

The match is failing around like 5611 of regexec.c:

                if (   trie->bitmap
                    && (NEXTCHR_IS_EOS || !TRIE_BITMAP_TEST(trie, nextchr)))
                {
        	    if (trie->states[ state ].wordnum) {
        	         DEBUG_EXECUTE_r(
                            Perl_re_exec_indentf( aTHX_  "%smatched empty string...%s\n",
                                          depth, PL_colors[4], PL_colors[5])
                        );

At this point nextchr has the first byte of the UTF-8 encoded \xE4 (0xc3).

Tony


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=129950

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About