> #!/usr/local/bin/perl -w
>
> {
> my ($e_accute_utf) = my ($e_accute) = chr 0xE9;
> $e_accute_utf .= chr 300;
> chop $e_accute_utf;
> my $E_accute = uc $e_accute;
> my $E_accute_utf = uc $e_accute_utf;
>
> if ($e_accute_utf eq $e_accute) {
> print "ok\n";
> } else {
> print "not ok # '$e_accute_utf' ne '$e_accute'\n";
> }
> if ($E_accute_utf eq $E_accute) {
> print "ok # '$E_accute_utf' eq '$E_accute'\n";
> } else {
> print "not ok # '$E_accute_utf' ne '$E_accute'\n";
That this doesn't work is locale-dependent: $E_accute is
uc $e_accute, and $e_accute is pure 8-bit character, and
whether uc upcases the $e_accute to $E_accute, is dependent
on the locale settings.
For example, for my Finnish locale, that test fails, since
$E_accute stays lowercase. But switching locale helps:
LC_ALL=fr_FR.ISO8859-1 ./perl -Ilib -Mlocale t1
ok
ok # 'É' eq 'É'
The $...utf version works because it obeys the Unicode lower/uppercase
rules, but that it got correctly mapped to Unicode in the first place
is purely incidental: the 0xE9 happened to be Latin-1, which happens
to be the lowest 256-character 'page' of Unicode.
Summary: the bug cannot be solved without creative application of
high-yield explosives to locales.
> }
> }
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
Thread Previous
|
Thread Next