develooper Front page | perl.perl5.porters | Postings from August 2013

[perl #119499] $! returned with UTF-8 flag under UTF-8 locales only under 5.19.2+

Thread Previous | Thread Next
From:
Victor Efimov via RT
Date:
August 29, 2013 22:27
Subject:
[perl #119499] $! returned with UTF-8 flag under UTF-8 locales only under 5.19.2+
Message ID:
rt-3.6.HEAD-1873-1377815238-982.119499-15-0@perl.org
On Thu Aug 29 14:06:57 2013, vsespb wrote:
> On Thu Aug 29 13:05:00 2013, public@khwilliamson.com wrote:
> > 2) As Victor notes, the commit does a UTF-8 validity check, so it is
> I agree that it's pretty reliable. However different languages and

Generator of byte sequences that are valid in UTF-8 and in another
encoding, and which represend letters (\w) in another encoding.

#!/usr/bin/env perl

use strict;
use warnings;
use Encode;
use utf8;

binmode STDOUT, ":encoding(UTF-8)";

my @A = grep { /\w/  } map { chr($_) } (128..1024);

for my $z1 (@A) { for my $z2 ('', @A) { for my $z3 ('', @A) {
for my $encoding (qw/ISO-8859-1 ISO-8859-2 ISO-8859-3 ISO-8859-4
ISO-8859-7 ISO-8859-8 ISO-8859-9 ISO-8859-10/) {
 my $S = $z1.$z2.$z3;
 my $e =  eval { encode($encoding, "$S", Encode::FB_CROAK); };
 next unless $e;
 my $xx = $e;
 $xx =~ s/(.)/sprintf("\\x%02X",ord($1))/eg;
 Encode::_utf8_on($e);
 if (utf8::valid($e)) {
  print "# $encoding [$S]".(length($S))." [$e] [$xx]\n";
  print <<"END";
perl -e 'use Encode; binmode STDOUT, ":encoding(UTF-8)"; my \$z = "$xx";
print "[", decode("UTF-8", "\$z", Encode::FB_CROAK), "]\\t[",
decode("$encoding", "\$z", Encode::FB_CROAK), "]\\n"'
END
 }
}
}}}
__END__

example output:

perl -e 'use Encode; binmode STDOUT, ":encoding(UTF-8)"; my $z =
"\xC3\xBE"; print "[", decode("UTF-8", "$z", Encode::FB_CROAK), "]\t[",
decode("ISO-8859-2", "$z", Encode::FB_CROAK), "]\n"'
perl -e 'use Encode; binmode STDOUT, ":encoding(UTF-8)"; my $z =
"\xC3\xBC"; print "[", decode("UTF-8", "$z", Encode::FB_CROAK), "]\t[",
decode("ISO-8859-2", "$z", Encode::FB_CROAK), "]\n"'
perl -e 'use Encode; binmode STDOUT, ":encoding(UTF-8)"; my $z =
"\xC3\xA1"; print "[", decode("UTF-8", "$z", Encode::FB_CROAK), "]\t[",
decode("ISO-8859-2", "$z", Encode::FB_CROAK), "]\n"'

example output of output example:

$perl -e 'use Encode; binmode STDOUT, ":encoding(UTF-8)"; my $z =
"\xC3\xBC"; print "[", decode("UTF-8", "$z", Encode::FB_CROAK), "]\t[",
decode("ISO-8859-2", "$z", Encode::FB_CROAK), "]\n"'
[ü]     [Ăź]


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=119499

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About