On 08/12/2017 11:55 AM, Zefram wrote: > I wrote: >> It provides a useful facility >> that's otherwise difficult to achieve: errno-based messages that are >> responsive to "use locale" in the same way as $!. > > Actually it's not quite the same, because there's an encoding issue. > In scope of "use locale", my_strerror() returns a string encoded in the > locale's charset. $! uses a dodgy heuristic to sometimes decode this. > > As a CPAN author, it'd be nice to have an API function that shows > what would go into $! for a given errno. It'd have to return an SV, > or operate by writing to a supplied SV. Currently the behaviour would > be my_strerror() plus dubious setting of SvUTF8. > > As a core coder and general Perl programmer, it'd be nice to have proper > string decoding on $!. It should be decoded based on the actual character > encoding of the locale that supplied the string, not just a guess. > It should be decoded regardless of what the encoding is, not only if > it's UTF-8. I don't understand much of your point, but patches welcome. The heuristic you say is dodgy has been used traditionally in perl, and it actually works well. For those of you who aren't familiar with it, it leaves the UTF-8 flag off on strings that have the same representation in UTF-8 as not. For those, the flag's state is immaterial. For other strings, it turns on the flag if and only if it is syntactically legal UTF-8. It turns out, due to the structured nature of UTF-8 and the chance way that symbols vs word characters are encoded in Latin-1 that it's very unlikely that a string of real words that are UTF-8 variant will be incorrectly classified. The comments in the code quote http://en.wikipedia.org/wiki/Charset_detection to that effect. There is no way of being able to determine with total reliability the locale that something is encoded in across all systems that Perl can run on. > > -zefram >Thread Previous | Thread Next