develooper Front page | perl.perl5.porters | Postings from February 2001

Re: Perl-Unicode fundamentals (was Re: IV preservation (was Re: [PATCH 5.7.0] compiling on OS/2))

Nick Ing-Simmons
February 21, 2001 08:48
Re: Perl-Unicode fundamentals (was Re: IV preservation (was Re: [PATCH 5.7.0] compiling on OS/2))
Message ID:
Ilya Zakharevich <> writes:
>Of course.
>So it "just
>works", ... 

So we seem to have cleared up a few things.

>> Our locale story is no where near as good as our Unicode story.
>> But that is mostly the fault of under-specified locale semantics 
>> at system level.
>No, the faults are at different places:
>  a) use locale is lexically scoped, so useless when modules are used;
>  b) there were no defined semantic of the interaction of locale and
>     Unicode [my proposal creates such a semantic];
>> Switching on EBCDIC-ness is cleaner.
>There is no difference (as far as Perl is concerned; except for
>sorting) between EBCDIC-ness and locale.  If you feel otherwise,
>please give an example to unconfuse me.

EBCDIC-ness is C-compile-time (./Configure time even) knowable.
So it does no suffer from "lexical" issues as in your (a) above.

So far I have avoided 'use locale' in all my descriptions.
So it seems we can document transparency and Unicode in the abstract
for iso-8859-1/Unicode or EBCDIC-ibm-1047/Unicode without using 
any "locale" analogies, assumptions etc.
This is a good thing.

When we have "transparent Unicode" in place, the brave and enthusiastic
can go look at what we should/could do to "use locale" in the new realm.
But let us put that part on one side for now and get the basics good
and solid - does that make sense to you?

>> use utf8;
>> still has semantic that it says the script itself is assumed to come
>> from a UTF-8 encoded source file.
>use utf8 is a mastodon.  
   mastodon as in :
    A. Large
    B. Hairy
    C. Extinct ? ;-) 

>It is not needed for any other purpose, so
>let it be so.

>> big5 has other problems in that it is a multi-byte encoding
>Does not matter: I discuss character mapping here, not encoding.

Agreed - I said _other_ problems for that reason.

Nick Ing-Simmons <>
Via, but not speaking for: Texas Instruments Ltd. Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About