develooper Front page | perl.perl6.internals | Postings from June 2001

Re: Unicode sorting...

Thread Previous | Thread Next
From:
Jarkko Hietaniemi
Date:
June 8, 2001 09:46
Subject:
Re: Unicode sorting...
Message ID:
20010608114558.J12606@chaos.wustl.edu
> I can't really believe that this would be a problem, but if they're
> integrated alphabets from different locales, will there be issues
> with sorting (if we're not planning to use the locale)? Are there
> instances where like characters were combined that will affect the
> sort orders?

Yes, it is an issue.  In the general case, you CANNOT sort strings of
several locales/languages into a single order that would satisfy all
of the locales/languages.  One often quoted example is German and
Swedish/Finnish: the LATIN CAPITAL LETTER A WITH RING ABOVE comes
between A and B in the former but after Z (not immediately, but
doesn't matter here) in the latter.  Similarly for all the accented
alphabetic characters, the rules how they are sorted differ from one
place to another , and many languages have special combinations like
ch, ss, ij that require special attention.

Unicode defines a canonical ordering which has hooks for locale
specific rules:

http://www.unicode.org/unicode/reports/tr10/

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About