Front page | perl.perl6.internals |
Postings from June 2001
Re: Unicode sorting...
Thread Previous
|
Thread Next
From:
Jarkko Hietaniemi
Date:
June 8, 2001 09:46
Subject:
Re: Unicode sorting...
Message ID:
20010608114558.J12606@chaos.wustl.edu
> I can't really believe that this would be a problem, but if they're
> integrated alphabets from different locales, will there be issues
> with sorting (if we're not planning to use the locale)? Are there
> instances where like characters were combined that will affect the
> sort orders?
Yes, it is an issue. In the general case, you CANNOT sort strings of
several locales/languages into a single order that would satisfy all
of the locales/languages. One often quoted example is German and
Swedish/Finnish: the LATIN CAPITAL LETTER A WITH RING ABOVE comes
between A and B in the former but after Z (not immediately, but
doesn't matter here) in the latter. Similarly for all the accented
alphabetic characters, the rules how they are sorted differ from one
place to another , and many languages have special combinations like
ch, ss, ij that require special attention.
Unicode defines a canonical ordering which has hooks for locale
specific rules:
http://www.unicode.org/unicode/reports/tr10/
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen
Thread Previous
|
Thread Next