develooper Front page | perl.perl5.porters | Postings from October 2016

[perl #129950] Some UTF-8 regular expression matches fail when readfrom file

Thread Previous
From:
Hiroshi Manabe via RT
Date:
October 24, 2016 20:36
Subject:
[perl #129950] Some UTF-8 regular expression matches fail when readfrom file
Message ID:
rt-4.0.24-3450-1477284275-884.129950-14-0@perl.org
On 2016-10月-23 日 21:23:20, manabe.hiroshi@gmail.com wrote:
> You can reproduc the bug with the following procedure:
> 1. perl -CO -e 'print "a\x{e4}";' > foo.txt # aä
> 2. perl -CI -e 'open IN, "<", "foo.txt"; $_ = <IN>; print
> m{^a|a\x{e4}$} . "matched\n" : "not matched\n";
> Output: not matched
> 
> This happenes only when the string is read from a file handle and the
> second character is in the range of \x{80}-\x{ff}.
> Curiously enough, the match succeeds if the regexp is m{^a|a[\x{e3}-
> \x{e4}]$} or m{^a|a[\x{e4}-\x{e5}]$}, but not if it is m{^a|a[\x{e4}-
> \x{e4}]$}.

Sorry, the bug only reproduces itself when there is a set of parenthes, i.e. m{^(a|a\x{e4})$} etc.

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About