May 23, 2007 08:53
Encode and emitting the little endian form of UTF-16 (not UTF-16LE)
Hi Dan,

I was wondering if there is some way to get Encode to emit the little
endian version of UTF-16 (with BOM) as a typical Win32 on Intel app
would do. It seems to me that currently

my $octets= encode('UTF-16',$string);

will only emit the big-endian form of it.

Of course well behaved apps shouldnt care, but some do, also i know I
can hand emit the BOM myself like so:

my $octets= encode('UTF-16LE',chr(0xFEFF).$string);

but this strck me as a bit convoluted and makes it a bit tricky to do
with IO layers. If there isnt a way to do it currently maybe the name
'UTF-16:le' or something similar could be used for this?

Also it looks like there is a typo in the quick reference table of

    Quick Reference
                        Decodes from ord(N)           Encodes chr(N) to...
               octet/char BOM S.P d800-dfff  ord > 0xffff     \x{1abcd} ==
          UCS-2BE       2   N   N  is bogus                  Not Available
          UCS-2LE       2   N   N     bogus                  Not Available
          UTF-16      2/4   Y   Y  is   S.P           S.P            BE/LE
          UTF-16BE    2/4   N   Y       S.P           S.P    0xd82a,0xdfcd
          UTF-16LE      2   N   Y       S.P           S.P    0x2ad8,0xcddf
          UTF-32        4   Y   -  is bogus         As is            BE/LE
          UTF-32BE      4   N   -     bogus         As is       0x0001abcd
          UTF-32LE      4   N   -     bogus         As is       0xcdab0100
          UTF-8       1-4   -   -     bogus   >= 4 octets   \xf0\x9a\af\8d

Shouldnt UTF-16LE also be 2/4 like the other UTF-16 variants?


perl -Mre=debug -e "/just|another|perl|hacker/"

