develooper Front page | perl.perl5.porters | Postings from December 2011

Re: [rt.cpan.org #73623] [perl #107326] perl's unicode conversion fails when iconv succeeds

Thread Previous | Thread Next
From:
ikegami@adaelis.com via RT
Date:
December 30, 2011 16:04
Subject:
Re: [rt.cpan.org #73623] [perl #107326] perl's unicode conversion fails when iconv succeeds
Message ID:
rt-3.8.HEAD-6893-1325289871-1207.73623-6-0@rt.cpan.org
<URL: https://rt.cpan.org/Ticket/Display.html?id=73623 >

On Fri, Dec 30, 2011 at 7:01 PM, Eric Brine <ikegami@adaelis.com> wrote:

> On Fri, Dec 30, 2011 at 6:44 PM, Linda A Walsh via RT <
> bug-Encode@rt.cpan.org> wrote:
>
>> <URL: https://rt.cpan.org/Ticket/Display.html?id=73623 >
>>
>> On Fri Dec 30 17:49:01 2011, ikegami wrote:
>> > Fix:
>> >
>> > -    my $need2slurp = $use_bom{ find_encoding($to)->name };
>> > +    my $need2slurp = $use_bom{ find_encoding($from)->name };
>> > +    if ($Opt{debug}){
>> > +        printf "Read mode: %s\n", $need2slurp ? 'Slurp' : 'Line';
>> > +    }
>> =====
>> Partly works:
>> > piconv -f UTF-16 -t UTF-8 <test.in >test.out
>> > iconv -f UTF-16 -t UTF-8 <test.in >testi.out
>> > cmp testi.out test.out && echo ok
>> ok
>> > piconv -f UTF-8 -t UTF-16 <test.out >test2.out
>> > cmp testi.in test2.out
>> test.in test2.out differ: byte 1, line 1
>>
>
Correction/elaboration:

C<< decode('UTF-16', ...) >> both requires a BOM and removes it
> (intentionally).
>

...and C<< encode('UTF-16', ...) >> adds it back, but uses UTF-16be instead
of UTF-16le.

You need C<< -to UTF-16le >> to use UTF-16le (instead of UTF-16be), but
that won't add the BOM, you need to avoid removing it in the first place by
using C<< -from UTF-16le >>.

- Eric


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About