develooper Front page | perl.perl5.porters | Postings from August 2013

[perl #119499] $! returned with UTF-8 flag under UTF-8 locales only under 5.19.2+

Thread Previous | Thread Next
From:
Victor Efimov via RT
Date:
August 28, 2013 20:02
Subject:
[perl #119499] $! returned with UTF-8 flag under UTF-8 locales only under 5.19.2+
Message ID:
rt-3.6.HEAD-1873-1377720134-1598.119499-15-0@perl.org

> Automatic decoding is definitely the more useful behavior

yes. when
a) it's documented (perllocale or perlunicode or perlvar)

b) it's not breaking existing code.
OR
c) it turned on with 'use feature' or something.


> I agree inconsistency is a bad thing though.

yes, especially when sometimes it's bytes, sometimes character and you
have to check UTF-8 flag.

> Not sure it's easy to fix though.

I think in Perl you can get encoding with
I18N::Langinfo::langinfo(I18N::Langinfo::CODESET())
Then decode using Encode module. (both are core modules)

Perlhaps that can be fixed in Perl code, in Errno. (We already load
Errno when %! accessed), which will auto-load I18N::Langinfo and Encode?

And I am totally not sure about perl C internals.

> Patches welcome.

I cannot do C coding.

Also I think that old code, relying on old behaviour was not relying on
something undocumented.

it was partly documented:

http://perldoc.perl.org/perllocale.html
> Note especially that the string value of $! and the error messages
given by external utilities may be changed by LC_MESSAGES 

(also perllocale now have updates, related to $! in blead)

http://perldoc.perl.org/perlunicode.html
> there are still many places where Unicode (in some encoding or
another) could be given as arguments or received as results, or both,
but it is not.


So ideal fix would be imho:
1. document it (perllocale or perlunicode or perlvar)
2. decode $! on non-UTF locales. always return character strings.
3. turn on new behaviour only with 'use feature'



On Wed Aug 28 12:44:17 2013, LeonT wrote:
> On Wed, Aug 28, 2013 at 10:52 AM, Victor Efimov
> <perlbug-followup@perl.org>wrote:
> 
> > $! returned as character string under 5.19.2+ and UTF-8 locales. But as
> > binary strings
> > under single-byte encoding locales.
> >
> > I believe this is useless and just makes it harder to decode $! value
> > properly.
> >
> 
> Automatic decoding is definitely the more useful behavior. I agree
> inconsistency is a bad thing though. Not sure it's easy to fix though.
> Patches welcome.
> 
> Also I am not sure if it will be possible to decode it when language with
> > Latin-1 -only characters is set.
> 
> 
> AFAIK that should work perfectly fine.
> 
> Leon




---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=119499

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About