On Sun Oct 23 21:48:55 2016, manabe.hiroshi@gmail.com wrote: > On 2016-10月-23 日 21:44:35, manabe.hiroshi@gmail.com wrote: > > On 2016-10月-23 日 21:23:20, manabe.hiroshi@gmail.com wrote: > > > You can reproduc the bug with the following procedure: > > > 1. perl -CO -e 'print "a\x{e4}";' > foo.txt # aä > > > 2. perl -CI -e 'open IN, "<", "foo.txt"; $_ = <IN>; print > > > m{^a|a\x{e4}$} . "matched\n" : "not matched\n"; > > > Output: not matched > > > > > > This happenes only when the string is read from a file handle and the > > > second character is in the range of \x{80}-\x{ff}. > > > Curiously enough, the match succeeds if the regexp is m{^a|a[\x{e3}- > > > \x{e4}]$} or m{^a|a[\x{e4}-\x{e5}]$}, but not if it is m{^a|a[\x{e4}- > > > \x{e4}]$}. > > > > Sorry, the bug only reproduces itself when there is a set of > > parenthes, i.e. m{^(a|a\x{e4})$} etc. > > Sorry again, the correct unicode option for the step 2 was -Ci. The string doesn't need to be from a file: $ ./perl -e '$_ = "a\xE4"; utf8::upgrade($_); print m{^(a|a\x{e4})$} ? "matched\n" : "not matched\n";' not matched (blead perl) The match is failing around like 5611 of regexec.c: if ( trie->bitmap && (NEXTCHR_IS_EOS || !TRIE_BITMAP_TEST(trie, nextchr))) { if (trie->states[ state ].wordnum) { DEBUG_EXECUTE_r( Perl_re_exec_indentf( aTHX_ "%smatched empty string...%s\n", depth, PL_colors[4], PL_colors[5]) ); At this point nextchr has the first byte of the UTF-8 encoded \xE4 (0xc3). Tony --- via perlbug: queue: perl5 status: open https://rt.perl.org/Ticket/Display.html?id=129950Thread Next