develooper Front page | perl.perl5.porters | Postings from September 2009

[perl #69166] Regex: \p{Lu} and \P{Lu} are not inverses when /i is specified

From:
Philip Hazel
Date:
September 16, 2009 05:28
Subject:
[perl #69166] Regex: \p{Lu} and \P{Lu} are not inverses when /i is specified
Message ID:
rt-3.6.HEAD-21832-1253093207-1497.69166-75-0@perl.org
# New Ticket Created by  Philip Hazel 
# Please include the string:  [perl #69166]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=69166 >


Hello,

It appears that in both 5.008008 and 5.010001 (which are the two 
versions of 5.8 and 5.10 that I have), the matching of \p{Lu} and \P{Lu} 
are not inverses of each other when /i is specified. Please see the test 
program below. The output is:

Perl 5.010001 Regular Expressions
--abAB
--

Without the /i, the output is:

Perl 5.010001 Regular Expressions
--abAB
AB--AB

(and the same for Perl 5.8). There's a similar problem with \p{Ll}.

There doesn't seem to be anything in the documentation about the effect
of /i on \p{Lu} and \p{Ll}. My expectation was that either /i would have 
no effect - and I think this is what it should do, since the pattern 
explicitly mentions the case - or, as a logical alternative, that Lu and 
Ll are treated as plain L in the presence of /i (but I think this is 
less useful).

Regards,
Philip

-- 
Philip Hazel


####################################################################
print "Perl $] Regular Expressions\n";

# Match upper case letters works case-dependently, even in the presence
# of /i. This makes sense.

$x = "ABabAB";
$x =~ s/\p{Lu}+/--/i;
print $x, "\n";

# However, the "not" version seems to work incorrectly, matching all
# the letters.

$x = "ABabAB";
$x =~ s/\P{Lu}+/--/i;
print $x, "\n";

####################################################################




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About