develooper Front page | perl.perl5.porters | Postings from November 2010

[perl #80030] Matching upper ASCII characters from file in RE patterns

Thread Previous | Thread Next
From:
Jonathan Pool
Date:
November 30, 2010 15:18
Subject:
[perl #80030] Matching upper ASCII characters from file in RE patterns
Message ID:
rt-3.6.HEAD-13564-1291154256-606.80030-75-0@perl.org
# New Ticket Created by  Jonathan Pool 
# Please include the string:  [perl #80030]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=80030 >


The attached script unibug.pl, which reads from the attached file unibug.txt, demonstrates a problem in Perl 5.10.0 which Karl Williamson says is still present in 5.13.7.

It matches the input line against 7 regular-expression patterns, 1-7. Patterns 3 and 7 should fail to match; the others should match.

However:

With "use utf8", pattern 3 matches instead of failing.

With "use encoding 'utf8'" (or with both pragmas), pattern 3 matches instead of failing, and patterns 4, 5, and 6 fail instead of matching.

Karl Williamson has provided two additional files for demonstrating this problem: nobreak_utf8.pl and nobreak_latin1.pl.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About