develooper Front page | perl.perl5.porters | Postings from December 2011

[rt.cpan.org #73623] [perl #107326] perl's unicode conversion fails when iconv succeeds

Thread Previous | Thread Next
From:
ikegami via RT
Date:
December 30, 2011 14:36
Subject:
[rt.cpan.org #73623] [perl #107326] perl's unicode conversion fails when iconv succeeds
Message ID:
rt-3.8.HEAD-6889-1325284579-449.73623-6-0@rt.cpan.org
<URL: https://rt.cpan.org/Ticket/Display.html?id=73623 >

On Fri Dec 30 14:00:32 2011, perlbug-followup@perl.org wrote:
> On Fri Dec 30 10:41:46 2011, LAWalsh wrote:
> > 
> > This is a bug report for perl from perl-diddler@tlinx.org,
> > generated with the help of perlbug 1.39 running under perl 5.12.3.
> > 
> > 
> > -----------------------------------------------------------------
> > [Please describe your issue here]
> > 
> > Was looking at ways to do upper/lower case compare, and bumped into
> > piconv as being a 'drop in replacement for "iconv"'.  So I decided
to try
> > it thinking it would be a 'hoot' if it was faster.
> > 
> > Rather than faster, it choked at the beginning of my 98M test file
> > (i.e. I truncated the file to the first few lines, 672 bytes), which
> > reproduces the problem just fine .. Tr�s sad...
> > 
> 
> You‘re right:
> 
> $ piconv5.15.6 -f utf16 -t utf-8 /Users/sprout/Downloads/test.in
> UTF-16:Unrecognised BOM d at
> /usr/local/lib/perl5/5.15.6/darwin-thread-multi-2level/Encode.pm line
> 196, <$ifh> line 2.
> 
> The file begins with <FF><FE>.
> 
> If I use utf-16le explicitly, it does the first line correctly, but
> quickly switches to Chinese, which means it’s off by one byte.

It sounds like it's reading line-by-line, where a line is a sequence of
bytes ended by 0A. Of course, that's wrong for UTF-16le (and UTF-16be,
for that matter).


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About