develooper Front page | perl.perl5.porters | Postings from September 2013

Re: [perl #119499] $! returned with UTF-8 flag under UTF-8 localesonly under 5.19.2+

Thread Previous | Thread Next
Karl Williamson
September 1, 2013 04:36
Re: [perl #119499] $! returned with UTF-8 flag under UTF-8 localesonly under 5.19.2+
Message ID:
On 08/31/2013 07:27 AM, Father Chrysostomos via RT wrote:
> On Thu Aug 29 13:05:00 2013, wrote:
>> On 08/29/2013 02:15 AM, Victor Efimov via RT wrote:
>>> On Wed Aug 28 23:40:08 2013, sprout wrote:
>>>> So now, $! may or may not be encoded, and you have to way of
>> telling
>>>> reliably without doing the same environment checks that perl itself
>> did
>>>> internally before deciding to decode $! itself.
>> I don't follow these arguments.  What that commit did is only to look
>> at
>> the string returned by the operating system, and if it is encoded in
>> UTF-8, to set that flag in the scalar.  That's it (*).  If the OS
>> didn't
>> return UTF-8, it leaves the flag alone.  I find it hard to comprehend
>> that this isn't the right thing to do.  For the first time, $! in
>> string
>> context is no different than any other string scalar in Perl.  They
>> have
>> a utf-8 bit set which means that the encoding is in UTF-8,
> You are still describing this from the point of view of the internals.

I persist in this because I believe your point is a red herring.  I 
believe that it is a valid and strong argument that bringing outlier 
behavior into conformity with the rest of how Perl operates may very 
well trump other concerns.  I was attempting to show that that is what 
this commit did.

Rather than address most of the rest of your email, some of which I 
believe are speciour or false, let's cut to the chase

>  how would you rewrite this code to work in
> 5.19.2 and up?
> if (!open fh, $filename) {
>     # add_to_log expects a string of characters, so decode it
>     add_to_log($filename, 0+$!, Encode::decode(
>         I18N::Langinfo::langinfo(I18N::Langinfo::CODESET()),
>         $!
>     ));
>     return;
> }

I feel compelled to point out that this code is buggy.  I18N::Langinfo 
is not portable to all platforms that Perl runs on, and CODESET gives 
the locale of LC_CTYPE, which may not be the same locale that $! is 
returned in: LC_MESSAGES. (Note that the code could be modified to 
change LC_CTYPE to the locale of LC_MESSAGES temporarily around the 
langinfo call to addess this bug.)  Also, some vendors' nl_langinfo() 
was, at the time, so buggy that the core .t for this doesn't do any 
"real" testing.

But on platforms where it works reliably, and the typical case where 
LC_CTYPE matches LC_MESSAGES, my commit does break this code.  If it 
were my code here, I'd 'use bytes' (I don't believe should be 
removed from core; that this area is one of the few valid uses for it, 
and this is not the thread to discuss it), or utf8::is_utf8() (I think 
we should soften somewhat the admonition against using that.)

I think all of us would agree that deference should be paid to 
(apparently) working code when making changes.  And it may be that this 
commit is so egregious, or not really helpful in enough places that its 
cost benefit ratio is not high enough to keep.

And $! remains an outlier in the sense that it is AFAIK, and I've looked 
hard (perhaps not hard enough), now the only place (except for some 
POSIX:: routines) where the program's underlying current locale leaks 
outside the scope of 'use locale'.  The main argument that I've heard 
for doing that is that $! is often for the end-user and not the 
programmer.  But it isn't for the end user if what gets displayed is 
gibberish, which includes being in some language the user doesn't know, 
though the latter is better than garbage bytes.  So what I'm advocating 
is re-examining whether we wish $! to respect 'use locale' or not.  If 
we chose to respect 'use locale', outside that, it would return messages 
in the system default locale, typically "C".

I'm pretty confident that the problem can't be solved so that no code 
has to change and things just start working correctly for everybody.
Currently, using $! in production code that can be operated by users who 
might have their own locales is much more complicated than people 
imagine.  "die $!" could print gibberish.   Maybe a partial answer is to 
create a wrapper that does the best it can on the platform it is running 
on, and suggest people change to use it.

If this commit is reverted, we do need to decide how we will address the 
bugs it fixed and the new ones that are sure to come in (barring some 
better answer).  Do we reject them and say you need to handle $! yourself?

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About