develooper Front page | perl.perl5.porters | Postings from August 2013

[perl #119499] $! returned with UTF-8 flag under UTF-8 locales only under 5.19.2+

Thread Previous
From:
Victor Efimov via RT
Date:
August 28, 2013 17:40
Subject:
[perl #119499] $! returned with UTF-8 flag under UTF-8 locales only under 5.19.2+
Message ID:
rt-3.6.HEAD-1873-1377711640-1517.119499-15-0@perl.org
> Code that is trying to decode $! should be using the (constant) numeric
> value rather than trying to parse the (locale-dependent) string.

I am not trying to parse $!. I am trying to print original error message
to the screen for the user.

> In string contexts, it returns the appropriate encoding. In UTF-8
> locales, it returns the UTF-8 encoded character string. In non-UTF-8
> locales, it returns the single-byte string in the correct encoding.

That is just wrong to sometimes return bytes, sometimes characters.

The following example worked fine before this change:

use strict;
use warnings;
use I18N::Langinfo;
use Encode;
my $enc = I18N::Langinfo::langinfo(I18N::Langinfo::CODESET());
binmode STDOUT, ":encoding($enc)";

my $filename = "not a file ".chr(0x444);

open my $f, "<", $filename or do {
   my $error = "$!";
   $error = decode($enc, "$error");
   print "Error accessing file $filename: $error\n";
};

but with this change:

- under non-Unicode locales works fine.
- under UTF-8 locales fails with "Cannot decode string with wide
characters "


Possible fix for this example is:

replace
  $error = decode($enc, "$error");
with
  $error = utf8::is_utf8($error) ? $error : decode($enc, "$error");

Another place where it breaks old code is:

perl -e 'open my $f, "<", "notafile" or die $!'

now prints warning: "Wide character in die" when locale is UTF-8 and
message contains wide characters.



> I ran this, substituting 'say $!' for the Dump, and got this output:
> Отказано в доступе
> which is the correct Cyrillic text. Prior to the patch, this would have
> printed garbage.

No, prior to this patch it prints correct (same) text but without "Wide
character" warnings.



On Wed Aug 28 10:19:47 2013, public@khwilliamson.com wrote:
> On 08/28/2013 02:52 AM, Victor Efimov (via RT) wrote:
> > # New Ticket Created by  Victor Efimov
> > # Please include the string:  [perl #119499]
> > # in the subject line of all future correspondence about this issue.
> > # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=119499 >
> >
> >
> 
> I am trying to understand your issues with this change.  I believe it is 
> working correctly now.
> 
> > $! returned as character string under 5.19.2+ and UTF-8 locales. But as
> > binary strings
> > under single-byte encoding locales.
> 
> I don't understand your use of the word 'binary' here.  In both cases, 
> it returns characters in contexts where strings are appropriate, and the 
> numeric value in contexts where numbers are appropriate
> 
> In string contexts, it returns the appropriate encoding.  In UTF-8 
> locales, it returns the UTF-8 encoded character string.  In non-UTF-8 
> locales, it returns the single-byte string in the correct encoding.
> >
> > I believe this is useless and just makes it harder to decode $! value
> > properly.
> 
> I don't have a clue as to why you think this is useless.  This change 
> was to fix https://rt.perl.org/rt3/Ticket/Display.html?id=112208
> (reported also as perl #117429, so more than one person found this to be 
> a bug).  The patch merely examines the string text of $!, and if it is 
> UTF-8, sets the flag indicating that.
> 
> Code that is trying to decode $! should be using the (constant) numeric 
> value rather than trying to parse the (locale-dependent) string.
> >
> > Also I am not sure if it will be possible to decode it when language
with
> > Latin-1 -only characters is set.
> 
> Again, use the numeric value when trying to parse the error.
> 
> >
> > LANG=ru_RU LANGUAGE=ru_RU:ru LC_ALL=ru_RU.utf8 perl -MPOSIX
-MDevel::Peek
> > -e '$!=EACCES; Dump "$!"'
> >
> > SV = PV(0x144dd80) at 0x14702a0
> >    REFCNT = 1
> >    FLAGS = (PADTMP,POK,pPOK,UTF8)
> >    PV = 0x1468e30
> > "\320\236\321\202\320\272\320\260\320\267\320\260\320\275\320\276
\320\262
> > \320\264\320\276\321\201\321\202\321\203\320\277\320\265"\0 [UTF8
> > "\x{41e}\x{442}\x{43a}\x{430}\x{437}\x{430}\x{43d}\x{43e} \x{432}
> > \x{434}\x{43e}\x{441}\x{442}\x{443}\x{43f}\x{435}"]
> >    CUR = 34
> >    LEN = 40
> 
> I ran this, substituting 'say $!' for the Dump, and got this output:
> Отказано в доступе
> 
> which is the correct Cyrillic text.  Prior to the patch, this would have 
> printed garbage.
> >
> >
> > LANG=ru_RU LANGUAGE=ru_RU:ru LC_ALL=ru_RU.CP1251
LC_MESSAGES=ru_RU.CP1251
> > perl -MPOSIX -MDevel::Peek -e '$!=EACCES; Dump "$!"'
> >
> > SV = PV(0x1db8d80) at 0x1ddf7e0
> >    REFCNT = 1
> >    FLAGS = (PADTMP,POK,pPOK)
> >    PV = 0x1f680d0 "\316\362\352\340\347\340\355\356 \342
> > \344\356\361\362\363\357\345"\0
> >    CUR = 18
> >    LEN = 24
> 
> I do not have a Windows machine with CP1251, but I hand looked at this 
> dump, and the characters are Отказано в доступе in that code page.  So 
> this looks proper.
> 




---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=119499

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About