develooper Front page | perl.perl5.porters | Postings from June 2011

really no way to list supported encoding?

Thread Next
From:
Tom Christiansen
Date:
June 12, 2011 07:22
Subject:
really no way to list supported encoding?
Message ID:
21311.1307888558@chthon
Is there really no way to definitely list out the supported 
encodings under Perl?  The Encode manpage states:

    To find out in detail which encodings are supported 
    by this package, see Encode::Supported.

But I see no programmtic list.   While I can certainly do this:

    require Encode;
    if (my $enc_obj = Encode::find_encoding($ext)) {
	my $name = $enc_obj->name || $ext;
	$enc_name = "encoding($name)";
    }

That still starts with something I'm probing; it doesn't give
me back a list of things to probe.  Looks at Encode::Alias 
came up with something like this:

    % perl -MData::Dump -MEncode -MEncode::Config -MEncode::MIME::Name \
	  -E 'dd \%{ +{ %Encode::Encoding, %Encode::Config::ExtModule, %Encode::MIME::Name::MIME_NAME_OF, reverse %Encode::MIME::Name::MIME_NAME_OF }}' \
      | perl -ne 'print if s/=>.*//' \
      | ucsort --upper-before-lower --preprocess='s/(\d+)/sprintf "%08d", $1/ge'  \
      | perl -pe 's/"//' \
      | fmt 

      7bit-jis Adobe-Standard-Encoding AdobeStandardEncoding AdobeSymbol
      Adobe-Symbol-Encoding AdobeZdingbat ascii ascii-ctrl big5-eten
      Big5-HKSCS big5-hkscs cp37 cp424 cp437 cp500 cp737 cp775 cp850
      cp852 cp855 cp856 cp857 cp858 cp860 cp861 cp862 cp863 cp864 cp865
      cp866 cp869 cp874 cp875 cp932 cp936 cp949 cp950 cp1006 cp1026
      cp1047 cp1250 cp1251 cp1252 cp1253 cp1254 cp1255 cp1256 cp1257
      cp1258 dingbats euc-cn EUC-JP euc-jp EUC-KR euc-kr gb2312-raw
      gb12345-raw GBK gsm0338 hp-roman8 hz HZ-GB-2312 IBM037 IBM424
      IBM437 IBM500 IBM775 IBM850 IBM852 IBM855 IBM857 IBM860 IBM861
      IBM862 IBM863 IBM864 IBM865 IBM866 IBM869 IBM1026 IBM1047 ISO-2022-JP
      iso-2022-jp iso-2022-jp-1 ISO-2022-KR iso-2022-kr ISO-8859-1
      iso-8859-1 ISO-8859-2 iso-8859-2 ISO-8859-3 iso-8859-3 ISO-8859-4
      iso-8859-4 ISO-8859-5 iso-8859-5 ISO-8859-6 iso-8859-6 ISO-8859-7
      iso-8859-7 ISO-8859-8 iso-8859-8 ISO-8859-9 iso-8859-9 ISO-8859-10
      iso-8859-10 iso-8859-11 ISO-8859-13 iso-8859-13 ISO-8859-14
      iso-8859-14 ISO-8859-15 iso-8859-15 ISO-8859-16 iso-8859-16
      iso-ir-165 jis0201-raw jis0208-raw jis0212-raw johab koi8-f KOI8-R
      koi8-r KOI8-U koi8-u ksc5601-raw MacArabic MacCentralEurRoman
      MacChineseSimp MacChineseTrad MacCroatian MacCyrillic MacDingbats
      MacFarsi MacGreek MacHebrew MacIcelandic MacJapanese MacKorean
      MacRoman MacRomanian MacRumanian MacSami MacSymbol MacThai
      MacTurkish MacUkrainian MIME-B MIME-Header MIME-Header-ISO_2022_JP
      MIME-Q nextstep null posix-bc Shift_JIS shiftjis symbol UCS-2BE
      UCS-2LE Unicode US-ASCII UTF-7 UTF-8 utf8 utf-8-strict UTF-16
      UTF-16BE UTF-16LE UTF-32 UTF-32BE UTF-32LE VISCII viscii windows-1250
      windows-1251 windows-1252 windows-1253 windows-1254 windows-1255
      windows-1256 windows-1257 windows-1258

But that isn't really satisfactory for a few reasons, one of which
is that you can't a priori predict what the regexes in Encode::Alias
are doing to decide, such as that UCS-2 is an alias for UCS-2BE.

I feel as though I'm missing something obvious, so would someone please
be so kind as to tell me what that is?

thanks,

--tom

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About