develooper Front page | perl.perl5.porters | Postings from August 2017

Re: my_strerror() as API function

Thread Previous | Thread Next
From:
Karl Williamson
Date:
August 12, 2017 22:31
Subject:
Re: my_strerror() as API function
Message ID:
f86cb701-7931-1e4f-5185-2a0ef6f8ef33@khwilliamson.com
On 08/12/2017 11:55 AM, Zefram wrote:
> I wrote:
>>                                           It provides a useful facility
>> that's otherwise difficult to achieve: errno-based messages that are
>> responsive to "use locale" in the same way as $!.
> 
> Actually it's not quite the same, because there's an encoding issue.
> In scope of "use locale", my_strerror() returns a string encoded in the
> locale's charset.  $! uses a dodgy heuristic to sometimes decode this.
> 
> As a CPAN author, it'd be nice to have an API function that shows
> what would go into $! for a given errno.  It'd have to return an SV,
> or operate by writing to a supplied SV.  Currently the behaviour would
> be my_strerror() plus dubious setting of SvUTF8.
> 
> As a core coder and general Perl programmer, it'd be nice to have proper
> string decoding on $!.  It should be decoded based on the actual character
> encoding of the locale that supplied the string, not just a guess.
> It should be decoded regardless of what the encoding is, not only if
> it's UTF-8.

I don't understand much of your point, but patches welcome.

The heuristic you say is dodgy has been used traditionally in perl, and 
it actually works well.  For those of you who aren't familiar with it, 
it leaves the UTF-8 flag off on strings that have the same 
representation in UTF-8 as not.  For those, the flag's state is 
immaterial.  For other strings, it turns on the flag if and only if it 
is syntactically legal UTF-8.  It turns out, due to the structured 
nature of UTF-8 and the chance way that symbols vs word characters are 
encoded in Latin-1 that it's very unlikely that a string of real words 
that are UTF-8 variant will be incorrectly classified.  The comments in 
the code quote http://en.wikipedia.org/wiki/Charset_detection to that 
effect.

There is no way of being able to determine with total reliability the 
locale that something is encoded in across all systems that Perl can run on.
> 
> -zefram
> 

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About