Front page | perl.perl5.changes |
Postings from April 2008
Change 33752: [PATCH] another go; was RE: [perl #49302] [[:print:]] v \p{Print}
From:
Rafael Garcia-Suarez
Date:
April 26, 2008 15:15
Subject:
Change 33752: [PATCH] another go; was RE: [perl #49302] [[:print:]] v \p{Print}
Change 33752 by rgs@scipion on 2008/04/26 22:06:23
Subject: [PATCH] another go; was RE: [perl #49302] [[:print:]] v \p{Print}
From: "Robin Barker" <Robin.Barker@npl.co.uk>
Date: Fri, 25 Apr 2008 14:21:06 +0100
Message-ID: <46A0F33545E63740BC7563DE59CA9C6D093B12@exchsvr2.npl.ad.local>
Affected files ...
... //depot/perl/pod/perlre.pod#145 edit
... //depot/perl/t/op/pat.t#310 edit
Differences ...
==== //depot/perl/pod/perlre.pod#145 (text) ====
Index: perl/pod/perlre.pod
--- perl/pod/perlre.pod#144~33129~ 2008-01-30 09:11:53.000000000 -0800
+++ perl/pod/perlre.pod 2008-04-26 15:06:23.000000000 -0700
@@ -375,20 +375,60 @@
digit IsDigit \d
graph IsGraph
lower IsLower
- print IsPrint
- punct IsPunct
+ print IsPrint (but see [2] below)
+ punct IsPunct (but see [3] below)
space IsSpace
IsSpacePerl \s
upper IsUpper
- word IsWord
+ word IsWord \w
xdigit IsXDigit
For example C<[[:lower:]]> and C<\p{IsLower}> are equivalent.
+However, the equivalence between C<[[:xxxxx:]]> and C<\p{IsXxxxx}>
+is not exact.
+
+=over 4
+
+=item [1]
+
If the C<utf8> pragma is not used but the C<locale> pragma is, the
classes correlate with the usual isalpha(3) interface (except for
"word" and "blank").
+But if the C<locale> or C<encoding> pragmas are not used and
+the string is not C<utf8>, then C<[[:xxxxx:]]> (and C<\w>, etc.)
+will not match characters 0x80-0xff; whereas C<\p{IsXxxxx}> will
+force the string to C<utf8> and can match these characters
+(as Unicode).
+
+=item [2]
+
+C<\p{IsPrint}> matches characters 0x09-0x0d but C<[[:print:]]> does not.
+
+=item [3]
+
+C<[[:punct::]]> matches the following but C<\p{IsPunct}> does not,
+because they are classed as symbols (not punctuation) in Unicode.
+
+=over 4
+
+=item C<$>
+
+Currency symbol
+
+=item C<+> C<< < >> C<=> C<< > >> C<|> C<~>
+
+Mathematical symbols
+
+=item C<^> C<`>
+
+Modifier symbols (accents)
+
+=back
+
+=back
+
The other named classes are:
=over 4
==== //depot/perl/t/op/pat.t#310 (xtext) ====
Index: perl/t/op/pat.t
--- perl/t/op/pat.t#309~33686~ 2008-04-15 05:43:02.000000000 -0700
+++ perl/t/op/pat.t 2008-04-26 15:06:23.000000000 -0700
@@ -4604,6 +4604,32 @@
iseq($te[0], '../');
}
+SKIP: {
+ if (ordA == 193) { skip("Assumes ASCII", 4) }
+
+ my @notIsPunct = grep {/[[:punct:]]/ and not /\p{IsPunct}/}
+ map {chr} 0x20..0x7f;
+ iseq( join('', @notIsPunct), '$+<=>^`|~',
+ '[:punct:] disagress with IsPunct on Symbols');
+
+ my @isPrint = grep {not/[[:print:]]/ and /\p{IsPrint}/}
+ map {chr} 0..0x1f, 0x7f..0x9f;
+ iseq( join('', @isPrint), "\x09\x0a\x0b\x0c\x0d\x85",
+ 'IsPrint disagrees with [:print:] on control characters');
+
+ my @isPunct = grep {/[[:punct:]]/ != /\p{IsPunct}/}
+ map {chr} 0x80..0xff;
+ iseq( join('', @isPunct), "\xa1\xab\xb7\xbb\xbf", # ¡ « · » ¿
+ 'IsPunct disagrees with [:punct:] outside ASCII');
+
+ my @isPunctLatin1 = eval q{
+ use encoding 'latin1';
+ grep {/[[:punct:]]/ != /\p{IsPunct}/} map {chr} 0x80..0xff;
+ };
+ if( $@ ){ skip( $@, 1); }
+ iseq( join('', @isPunctLatin1), '',
+ 'IsPunct agrees with [:punct:] with explicit Latin1');
+}
# Test counter is at bottom of file. Put new tests above here.
@@ -4667,7 +4693,7 @@
# Don't forget to update this!
BEGIN {
- $::TestCount = 4031;
+ $::TestCount = 4035;
print "1..$::TestCount\n";
}
End of Patch.
-
Change 33752: [PATCH] another go; was RE: [perl #49302] [[:print:]] v \p{Print}
by Rafael Garcia-Suarez