<URL: https://rt.cpan.org/Ticket/Display.html?id=73623 > On Fri Dec 30 14:00:32 2011, perlbug-followup@perl.org wrote: > On Fri Dec 30 10:41:46 2011, LAWalsh wrote: > > > > This is a bug report for perl from perl-diddler@tlinx.org, > > generated with the help of perlbug 1.39 running under perl 5.12.3. > > > > > > ----------------------------------------------------------------- > > [Please describe your issue here] > > > > Was looking at ways to do upper/lower case compare, and bumped into > > piconv as being a 'drop in replacement for "iconv"'. So I decided to try > > it thinking it would be a 'hoot' if it was faster. > > > > Rather than faster, it choked at the beginning of my 98M test file > > (i.e. I truncated the file to the first few lines, 672 bytes), which > > reproduces the problem just fine .. Tr�s sad... > > > > You‘re right: > > $ piconv5.15.6 -f utf16 -t utf-8 /Users/sprout/Downloads/test.in > UTF-16:Unrecognised BOM d at > /usr/local/lib/perl5/5.15.6/darwin-thread-multi-2level/Encode.pm line > 196, <$ifh> line 2. > > The file begins with <FF><FE>. > > If I use utf-16le explicitly, it does the first line correctly, but > quickly switches to Chinese, which means it’s off by one byte. It sounds like it's reading line-by-line, where a line is a sequence of bytes ended by 0A. Of course, that's wrong for UTF-16le (and UTF-16be, for that matter).Thread Previous | Thread Next