develooper Front page | perl.perl5.porters | Postings from February 2001

Re: [ID 20010225.003] inconsistencies in locale support

Thread Previous | Thread Next
From:
andrew
Date:
February 26, 2001 15:32
Subject:
Re: [ID 20010225.003] inconsistencies in locale support
Message ID:
20010226183150.X17705@pimlott.ne.mediaone.net
I'll tell what I found out, then answer what Nick says because it
(too) raises interesting points.

To recap, the problem was that lc honors locale (under "use
locale"), but POSIX::tolower does not.  Interestingly,
POSIX::isalpha does honor locale.  You can test this trivially by
comparing the result of lc, tolower, and isalpha on chr 0xc0, in the
"C" (ASCII) locale versus the en_US locale (assuming your en_US
locale is ISO-8859-1, as it is by default under GNU libc).

The reason is that isalpha is implemented as .xs code, while tolower
is a pure Perl subroutine that calls lc.  Apparently, the xs code is
executed in the same lexical context as the calling code, while
tolower obviously is not.

I think this raises some fundamental issues, but I'm not sure
exactly which.  It seems clear that one would like to be able to
write a correct tolower (ie, exactly equivalent to lc, as per the
POSIX documentation) in pure Perl.  One possibility is a TCL-like
"uplevel", but I desperately hope that doesn't turn out to be the
best option.  Another is to have dynamically scoped pramata.
Another is to have a way to explicitly make a pragma dynamic, eg
"use :dynamically locale".  Anything better?

It also seems clear that XS code should not always run in the
lexical context of the caller, because I'm sure that there are cases
in which inheriting pramata would be wrong or at least confusing
(action at a distance).  But I'm sure this have performance
implications.

Now, to respond to Nick:

On Sun, Feb 25, 2001 at 10:34:42PM +0000, nick@ing-simmons.net wrote:
> Because you have not called POSIX's setlocale() (I am not 100% sure 
> we support it.)

The perllocale documentation describes setlocale as fully
functional, but also mentions that it is only necessary if the
relevant environment variables are not set.  Grep for "one of the
following must be true".

At any rate, calling 'setlocale LC_ALL, "en_US";' didn't change
anything.

> >and lc works as
> >expected in the program, but not on the debugger command line.
> 
> If I recall correctly
> 
> use locale; 
> is lexically scoped, and debugger is in a different scope.

Duh, sorry.  I'm not going to guess how the debugger accesses
lexical variables (someday I'll have the courage to look) ...

On the other hand, the debugger not assuming the scope of the
current line of code is very arguably a bug, especially since it
does have access to lexical variables.

> It is also far from uncommon for 'en_US' locales to claim that they are 
> ASCII only - (and that true Americans don't need no nasty pinko accented
> characters ;-)) 
> 
> But as your C program works that seems not to be the problem.

Yes, I am certain from testing and by looking at the locale source
that the en_US locale uses ISO-8859-1.

Andrew

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About