On 25 October 2016 at 12:12, Dave Mitchell <davem@iabyn.com> wrote: > On Mon, Oct 24, 2016 at 03:57:15PM -0700, Tony Cook via RT wrote: >> On Sun Oct 23 21:48:55 2016, manabe.hiroshi@gmail.com wrote: >> > On 2016-10月-23 日 21:44:35, manabe.hiroshi@gmail.com wrote: >> > > On 2016-10月-23 日 21:23:20, manabe.hiroshi@gmail.com wrote: >> > > > You can reproduc the bug with the following procedure: >> > > > 1. perl -CO -e 'print "a\x{e4}";' > foo.txt # aä >> > > > 2. perl -CI -e 'open IN, "<", "foo.txt"; $_ = <IN>; print >> > > > m{^a|a\x{e4}$} . "matched\n" : "not matched\n"; >> > > > Output: not matched >> > > > >> > > > This happenes only when the string is read from a file handle and the >> > > > second character is in the range of \x{80}-\x{ff}. >> > > > Curiously enough, the match succeeds if the regexp is m{^a|a[\x{e3}- >> > > > \x{e4}]$} or m{^a|a[\x{e4}-\x{e5}]$}, but not if it is m{^a|a[\x{e4}- >> > > > \x{e4}]$}. >> > > >> > > Sorry, the bug only reproduces itself when there is a set of >> > > parenthes, i.e. m{^(a|a\x{e4})$} etc. >> > >> > Sorry again, the correct unicode option for the step 2 was -Ci. >> >> The string doesn't need to be from a file: >> >> $ ./perl -e '$_ = "a\xE4"; utf8::upgrade($_); print m{^(a|a\x{e4})$} ? "matched\n" : "not matched\n";' >> not matched >> >> (blead perl) >> >> The match is failing around like 5611 of regexec.c: >> >> if ( trie->bitmap >> && (NEXTCHR_IS_EOS || !TRIE_BITMAP_TEST(trie, nextchr))) >> { >> if (trie->states[ state ].wordnum) { >> DEBUG_EXECUTE_r( >> Perl_re_exec_indentf( aTHX_ "%smatched empty string...%s\n", >> depth, PL_colors[4], PL_colors[5]) >> ); >> >> At this point nextchr has the first byte of the UTF-8 encoded \xE4 (0xc3). > > I'm looking into this as we speak. I was going to look into it later as well. Let me know how far you get. We used to preload the bitmap with the first byte of the unicode representation of the string, but I guess I can leave it to you. Let me know otherwise. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next