develooper Front page | perl.perl5.porters | Postings from September 2011

Re: [perl #98370] Perl doesn't decode IBM037

Thread Previous | Thread Next
From:
Mark Overmeer
Date:
September 4, 2011 15:26
Subject:
Re: [perl #98370] Perl doesn't decode IBM037
Message ID:
20110904222425.GJ28484@moon.overmeer.net
* tomushkin@gmail.com (perlbug-followup@perl.org) [110904 21:52]:
> # New Ticket Created by  tomushkin@gmail.com 
> # Please include the string:  [perl #98370]
> # in the subject line of all future correspondence about this issue. 
> # <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=98370 >
> 
> # perl -MEncode -e 'decode("IBM037", "\x{a2}\x{97}\x{81}\x{94}")'
> Unknown encoding 'IBM037' at -e line 1

With the attached simple script, you can find all missing encodings from
the IANA official list. Lines which end-up with a leading '*' are missing,
with '+'...
  present from IANA: 125 charsets =  66 names,  59 aliases
  missing from IANA: 704 charsets = 190 names, 514 aliases

So, there is a chance you encounter character-sets that Perl does not
understand. Amongst them many IBM* sets.
-----
#!/usr/bin/perl

use warnings;
use strict;
use Encode qw/find_encoding/;

sub status($);

open GET, "wget http://www.iana.org/assignments/character-sets --output-document=- |"
   or die $!;

while(<GET>)
{
   if( m/(Name|Alias)\:\s+(\S+)/ )
   {   my $status = status $2;
       s/^/status/;
   }
   else
   {   s/^/  /;
   }
   print;
}

sub status($)
{   my $charset = shift;
    $charset =~ m/^none$/i and return ' ';

    my $enc = find_encoding $charset;
    defined $enc ? '+' : '*';
}

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About