Front page | perl.perl5.porters |
Postings from May 2004
[unicore/mktables] some tests in TestProp.pl fail
From:
SADAHIRO Tomoyuki
Date:
May 1, 2004 06:48
Subject:
[unicore/mktables] some tests in TestProp.pl fail
Message ID:
20040501225021.0C09.BQW10602@nifty.com
Hello,
After recent enhancement of Unicode properties, some tests provided
by TestProp.pl (auto-generated by "perl ./mktables -maketest") fails.
I'd like to omit its long output but the summary:
11310 tests, 150 failed!
This is because some property names are changed to "fuzzy" (case-insensitive,
ignored spaces etc) though TestProp.pl expects they are "exact".
E.g. \p{ll} for \p{Ll} was not allowed but now is accepted.
This codelet will reveal this change.
#!perl
print $], "\n";
for my $P (qw/Ll Lu Alpha Alnum Lower Punct Upper/) {
eval qq{ ' ' =~ /\\p{\L$P\E}/ };
print $@ ? "exact" : "fuzzy", " # $P\n";
}
__END__
5.009002 (perl-current)
fuzzy # Ll
fuzzy # Lu
fuzzy # Alpha
exact # Alnum
fuzzy # Lower
exact # Punct
fuzzy # Upper
5.008004
exact # Ll
exact # Lu
exact # Alpha
exact # Alnum
exact # Lower
exact # Punct
exact # Upper
Property names whose behavior have changed are classified
into following (1) to (3).
(1) General category names like L, Lu, etc.
(2) 'Thai' script name.
Parhaps because this one is the only whose name is same as
its 4-letter script name code.
(3) Three of POSIX class names: Alpha, Lower, Upper.
They are confused with abbreviated names in PropertyAliases.txt
of "Alphabetic", "Lowercase", "Uppercase".
So another weirdness occurs:
In current implementation, POSIX \p{Alpha}, \p{Lower}, \p{Upper}
are not equivalent to Unicode \p{Alphabetic}, \p{Lowercase}, \p{Uppercase}.
So \p{alpha} (as [:alpha:]) and \p{Alpha} (as \p{Alphabetic}) are inconsistent.
#!perl
use charnames ":full";
for my $P (qw/ Alpha alpha ALPHA Alphabetic alphabetic/ ) {
my $ret = eval qq{ "\N{ROMAN NUMERAL ONE}" =~ /\\p{$P}/ };
print $@ ? "error" : $ret ? "is" : "is not", " $P\n";
}
__END__
is not Alpha
is alpha
is ALPHA
is Alphabetic
is alphabetic
Unicode Technical Standard #18 says:
cf. http://www.unicode.org/reports/tr18/#Categories
The recommended names for UCD properties and property values are
in PropertyAliases.txt [Prop] and PropertyValueAliases.txt [PropValue].
There are both abbreviated names and longer, more descriptive names.
It is strongly recommended that both names be recognized, and that
loose matching of property names be used, whereby the case distinctions,
whitespace, hyphens, and underbar are ignored.
So *all* the property names may be fuzzy.
Anyhow I think TestProp.pl should result in "All tests passed".
Regards,
SADAHIRO Tomoyuki
-
[unicore/mktables] some tests in TestProp.pl fail
by SADAHIRO Tomoyuki