develooper Front page | perl.perl6.internals | Postings from June 2001

Re: The internal string API

Thread Previous | Thread Next
Dan Sugalski
June 29, 2001 07:49
Re: The internal string API
Message ID:
At 07:57 PM 6/28/2001 -0500, Jarkko Hietaniemi wrote:
>On Fri, Jun 29, 2001 at 02:52:03AM +0200, Bart Lateur wrote:
> > If I have a file in French, and a file in Chinese, I want one to
> > be treated as French, and the other as Chinese.
>And what do you do one you have a list of say, employees, with
>French, Chinese, and Spanish names, and you want to show them
>some order, and how does your fellow Chinese or Hindi worker
>want to see the same list ordered...?
>Also, please don't confuse locales with 'languages'.  To start with,
>there's no definition of 'language' that people can agree on.  Usually
>the existing locale definitions try to work around this fuzziness by
>having (language,country) pairs, but that is just a partial solution.

We're going to split things out into pieces internally.

* String data will be tagged sufficiently to make the characters uniquely 
identified. That means we'll see Unicode, Big5/Traditional, Shift-JIS, or 
whatever. This will do nothing except make sure that we know what the heck 
character 0x0455 is.

* String sorting order will be specifiable, overridable, and generally 
pluggable. As has been pointed out, you sort German, French, and English 
text a little differently from each other. (Not to mention things like 
Arabic, Chinese, or Egyptian Heiroglyphics) That's not really bound to the 
character encoding of the data, as the same data will be sorted differently 
by different people.

* Formatting bits will be a separate thing entirely as well. How numbers 
and dates and such are formatted vary even more than how data's sorted, it 

Now, we'll probably have some sort of locale specification that sets the 
default encoding for unknown incoming data, default sorting order, and 
default formats for things we format, but that'll all be overridable. How 
it's overridable at the language level's Larry's issue ("sort as grek @foo" 
maybe) but we'll definitely do it. And yes, I know we could just make 
people hand-format and stick in sort subs, but bleah. Too much work on the 
part of a perl programmer.


--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai                         have teddy bears and even
                                      teddy bears get drunk

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About