develooper Front page | perl.perl5.porters | Postings from May 2004

[unicore/mktables] some tests in fail

May 1, 2004 06:48
[unicore/mktables] some tests in fail
Message ID:

After recent enhancement of Unicode properties, some tests provided
by (auto-generated by "perl ./mktables -maketest") fails.

I'd like to omit its long output but the summary:

    11310 tests, 150 failed!

This is because some property names are changed to "fuzzy" (case-insensitive,
ignored spaces etc) though expects they are "exact".
E.g. \p{ll} for \p{Ll} was not allowed but now is accepted.

This codelet will reveal this change.
print $], "\n";
for my $P (qw/Ll Lu Alpha Alnum Lower Punct Upper/) {
    eval qq{ ' ' =~ /\\p{\L$P\E}/ };
    print $@ ? "exact" : "fuzzy", " # $P\n";
5.009002 (perl-current)
fuzzy # Ll
fuzzy # Lu
fuzzy # Alpha
exact # Alnum
fuzzy # Lower
exact # Punct
fuzzy # Upper

exact # Ll
exact # Lu
exact # Alpha
exact # Alnum
exact # Lower
exact # Punct
exact # Upper

Property names whose behavior have changed are classified
into following (1) to (3).

(1) General category names like L, Lu, etc.

(2) 'Thai' script name.
    Parhaps because this one is the only whose name is same as
    its 4-letter script name code.

(3) Three of POSIX class names: Alpha, Lower, Upper.
    They are confused with abbreviated names in PropertyAliases.txt
    of "Alphabetic", "Lowercase", "Uppercase".

    So another weirdness occurs:
    In current implementation, POSIX \p{Alpha}, \p{Lower}, \p{Upper}
    are not equivalent to Unicode \p{Alphabetic}, \p{Lowercase}, \p{Uppercase}.

    So \p{alpha} (as [:alpha:]) and \p{Alpha} (as \p{Alphabetic}) are inconsistent.

use charnames ":full";
for my $P (qw/ Alpha alpha ALPHA Alphabetic alphabetic/ ) {
    my $ret = eval qq{ "\N{ROMAN NUMERAL ONE}" =~ /\\p{$P}/ };
    print $@ ? "error" : $ret ? "is" : "is not", " $P\n";
is not Alpha
is alpha
is Alphabetic
is alphabetic

Unicode Technical Standard #18 says:

     The recommended names for UCD properties and property values are
     in PropertyAliases.txt [Prop] and PropertyValueAliases.txt [PropValue].
     There are both abbreviated names and longer, more descriptive names.
     It is strongly recommended that both names be recognized, and that
     loose matching of property names be used, whereby the case distinctions,
     whitespace, hyphens, and underbar are ignored.

So *all* the property names may be fuzzy.
Anyhow I think should result in "All tests passed".

SADAHIRO Tomoyuki Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About