develooper Front page | perl.perl5.porters | Postings from October 2016

[perl #129950] Some UTF-8 regular expression matches fail when readfrom file

Thread Previous | Thread Next
From:
Hiroshi Manabe
Date:
October 24, 2016 20:36
Subject:
[perl #129950] Some UTF-8 regular expression matches fail when readfrom file
Message ID:
rt-4.0.24-3450-1477283000-1688.129950-75-0@perl.org
# New Ticket Created by  Hiroshi Manabe 
# Please include the string:  [perl #129950]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=129950 >


You can reproduc the bug with the following procedure:
1. perl -CO -e 'print "a\x{e4}";' > foo.txt # aƤ
2. perl -CI -e 'open IN, "<", "foo.txt"; $_ = <IN>; print m{^a|a\x{e4}$} . "matched\n" : "not matched\n";
Output: not matched

This happenes only when the string is read from a file handle and the second character is in the range of \x{80}-\x{ff}.
Curiously enough, the match succeeds if the regexp is m{^a|a[\x{e3}-\x{e4}]$} or m{^a|a[\x{e4}-\x{e5}]$}, but not if it is m{^a|a[\x{e4}-\x{e4}]$}.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About