develooper Front page | perl.perl5.porters | Postings from August 2013

Re: [perl #119499] $! returned with UTF-8 flag under UTF-8 localesonly under 5.19.2+

Thread Previous | Thread Next
From:
Karl Williamson
Date:
August 28, 2013 17:19
Subject:
Re: [perl #119499] $! returned with UTF-8 flag under UTF-8 localesonly under 5.19.2+
Message ID:
521E3101.4080106@khwilliamson.com
On 08/28/2013 02:52 AM, Victor Efimov (via RT) wrote:
> # New Ticket Created by  Victor Efimov
> # Please include the string:  [perl #119499]
> # in the subject line of all future correspondence about this issue.
> # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=119499 >
>
>

I am trying to understand your issues with this change.  I believe it is 
working correctly now.

> $! returned as character string under 5.19.2+ and UTF-8 locales. But as
> binary strings
> under single-byte encoding locales.

I don't understand your use of the word 'binary' here.  In both cases, 
it returns characters in contexts where strings are appropriate, and the 
numeric value in contexts where numbers are appropriate

In string contexts, it returns the appropriate encoding.  In UTF-8 
locales, it returns the UTF-8 encoded character string.  In non-UTF-8 
locales, it returns the single-byte string in the correct encoding.
>
> I believe this is useless and just makes it harder to decode $! value
> properly.

I don't have a clue as to why you think this is useless.  This change 
was to fix https://rt.perl.org/rt3/Ticket/Display.html?id=112208
(reported also as perl #117429, so more than one person found this to be 
a bug).  The patch merely examines the string text of $!, and if it is 
UTF-8, sets the flag indicating that.

Code that is trying to decode $! should be using the (constant) numeric 
value rather than trying to parse the (locale-dependent) string.
>
> Also I am not sure if it will be possible to decode it when language with
> Latin-1 -only characters is set.

Again, use the numeric value when trying to parse the error.

>
> LANG=ru_RU LANGUAGE=ru_RU:ru LC_ALL=ru_RU.utf8 perl -MPOSIX -MDevel::Peek
> -e '$!=EACCES; Dump "$!"'
>
> SV = PV(0x144dd80) at 0x14702a0
>    REFCNT = 1
>    FLAGS = (PADTMP,POK,pPOK,UTF8)
>    PV = 0x1468e30
> "\320\236\321\202\320\272\320\260\320\267\320\260\320\275\320\276 \320\262
> \320\264\320\276\321\201\321\202\321\203\320\277\320\265"\0 [UTF8
> "\x{41e}\x{442}\x{43a}\x{430}\x{437}\x{430}\x{43d}\x{43e} \x{432}
> \x{434}\x{43e}\x{441}\x{442}\x{443}\x{43f}\x{435}"]
>    CUR = 34
>    LEN = 40

I ran this, substituting 'say $!' for the Dump, and got this output:
Отказано в доступе

which is the correct Cyrillic text.  Prior to the patch, this would have 
printed garbage.
>
>
> LANG=ru_RU LANGUAGE=ru_RU:ru LC_ALL=ru_RU.CP1251 LC_MESSAGES=ru_RU.CP1251
> perl -MPOSIX -MDevel::Peek -e '$!=EACCES; Dump "$!"'
>
> SV = PV(0x1db8d80) at 0x1ddf7e0
>    REFCNT = 1
>    FLAGS = (PADTMP,POK,pPOK)
>    PV = 0x1f680d0 "\316\362\352\340\347\340\355\356 \342
> \344\356\361\362\363\357\345"\0
>    CUR = 18
>    LEN = 24

I do not have a Windows machine with CP1251, but I hand looked at this 
dump, and the characters are Отказано в доступе in that code page.  So 
this looks proper.


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About