Front page | perl.perl5.porters |
Postings from August 2013
[perl #119499] $! returned with UTF-8 flag under UTF-8 locales only under 5.19.2+
Thread Previous
From:
Victor Efimov via RT
Date:
August 28, 2013 17:40
Subject:
[perl #119499] $! returned with UTF-8 flag under UTF-8 locales only under 5.19.2+
Message ID:
rt-3.6.HEAD-1873-1377711640-1517.119499-15-0@perl.org
> Code that is trying to decode $! should be using the (constant) numeric
> value rather than trying to parse the (locale-dependent) string.
I am not trying to parse $!. I am trying to print original error message
to the screen for the user.
> In string contexts, it returns the appropriate encoding. In UTF-8
> locales, it returns the UTF-8 encoded character string. In non-UTF-8
> locales, it returns the single-byte string in the correct encoding.
That is just wrong to sometimes return bytes, sometimes characters.
The following example worked fine before this change:
use strict;
use warnings;
use I18N::Langinfo;
use Encode;
my $enc = I18N::Langinfo::langinfo(I18N::Langinfo::CODESET());
binmode STDOUT, ":encoding($enc)";
my $filename = "not a file ".chr(0x444);
open my $f, "<", $filename or do {
my $error = "$!";
$error = decode($enc, "$error");
print "Error accessing file $filename: $error\n";
};
but with this change:
- under non-Unicode locales works fine.
- under UTF-8 locales fails with "Cannot decode string with wide
characters "
Possible fix for this example is:
replace
$error = decode($enc, "$error");
with
$error = utf8::is_utf8($error) ? $error : decode($enc, "$error");
Another place where it breaks old code is:
perl -e 'open my $f, "<", "notafile" or die $!'
now prints warning: "Wide character in die" when locale is UTF-8 and
message contains wide characters.
> I ran this, substituting 'say $!' for the Dump, and got this output:
> Отказано в доступе
> which is the correct Cyrillic text. Prior to the patch, this would have
> printed garbage.
No, prior to this patch it prints correct (same) text but without "Wide
character" warnings.
On Wed Aug 28 10:19:47 2013, public@khwilliamson.com wrote:
> On 08/28/2013 02:52 AM, Victor Efimov (via RT) wrote:
> > # New Ticket Created by Victor Efimov
> > # Please include the string: [perl #119499]
> > # in the subject line of all future correspondence about this issue.
> > # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=119499 >
> >
> >
>
> I am trying to understand your issues with this change. I believe it is
> working correctly now.
>
> > $! returned as character string under 5.19.2+ and UTF-8 locales. But as
> > binary strings
> > under single-byte encoding locales.
>
> I don't understand your use of the word 'binary' here. In both cases,
> it returns characters in contexts where strings are appropriate, and the
> numeric value in contexts where numbers are appropriate
>
> In string contexts, it returns the appropriate encoding. In UTF-8
> locales, it returns the UTF-8 encoded character string. In non-UTF-8
> locales, it returns the single-byte string in the correct encoding.
> >
> > I believe this is useless and just makes it harder to decode $! value
> > properly.
>
> I don't have a clue as to why you think this is useless. This change
> was to fix https://rt.perl.org/rt3/Ticket/Display.html?id=112208
> (reported also as perl #117429, so more than one person found this to be
> a bug). The patch merely examines the string text of $!, and if it is
> UTF-8, sets the flag indicating that.
>
> Code that is trying to decode $! should be using the (constant) numeric
> value rather than trying to parse the (locale-dependent) string.
> >
> > Also I am not sure if it will be possible to decode it when language
with
> > Latin-1 -only characters is set.
>
> Again, use the numeric value when trying to parse the error.
>
> >
> > LANG=ru_RU LANGUAGE=ru_RU:ru LC_ALL=ru_RU.utf8 perl -MPOSIX
-MDevel::Peek
> > -e '$!=EACCES; Dump "$!"'
> >
> > SV = PV(0x144dd80) at 0x14702a0
> > REFCNT = 1
> > FLAGS = (PADTMP,POK,pPOK,UTF8)
> > PV = 0x1468e30
> > "\320\236\321\202\320\272\320\260\320\267\320\260\320\275\320\276
\320\262
> > \320\264\320\276\321\201\321\202\321\203\320\277\320\265"\0 [UTF8
> > "\x{41e}\x{442}\x{43a}\x{430}\x{437}\x{430}\x{43d}\x{43e} \x{432}
> > \x{434}\x{43e}\x{441}\x{442}\x{443}\x{43f}\x{435}"]
> > CUR = 34
> > LEN = 40
>
> I ran this, substituting 'say $!' for the Dump, and got this output:
> Отказано в доступе
>
> which is the correct Cyrillic text. Prior to the patch, this would have
> printed garbage.
> >
> >
> > LANG=ru_RU LANGUAGE=ru_RU:ru LC_ALL=ru_RU.CP1251
LC_MESSAGES=ru_RU.CP1251
> > perl -MPOSIX -MDevel::Peek -e '$!=EACCES; Dump "$!"'
> >
> > SV = PV(0x1db8d80) at 0x1ddf7e0
> > REFCNT = 1
> > FLAGS = (PADTMP,POK,pPOK)
> > PV = 0x1f680d0 "\316\362\352\340\347\340\355\356 \342
> > \344\356\361\362\363\357\345"\0
> > CUR = 18
> > LEN = 24
>
> I do not have a Windows machine with CP1251, but I hand looked at this
> dump, and the characters are Отказано в доступе in that code page. So
> this looks proper.
>
---
via perlbug: queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=119499
Thread Previous