develooper Front page | perl.perl5.porters | Postings from September 2013

[perl #80058] [Bug Report] Bad \n convert, using UTF-16 on Win32

Thread Next
From:
Tony Cook via RT
Date:
September 20, 2013 05:27
Subject:
[perl #80058] [Bug Report] Bad \n convert, using UTF-16 on Win32
Message ID:
rt-3.6.HEAD-1873-1379654851-1889.80058-15-0@perl.org
On Wed Dec 01 04:38:26 2010, mezmerik@gmail.com wrote:
> This is a bug report for perl from mezmerik@gmail.com,
> generated with the help of perlbug 1.39 running under perl 5.12.2.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> 
> Hello,
> 
> I'm using ActivePerl 5.12.2 on Windows 7.
> 
> Perl for Win32 has a feature to convert a single "LF" (without
> preceding "CR") to "CRLF", but my perl seems to determine what "LF" is
> on UTF-16 incorrectly. In ANSI and UTF-8 files, LF's bytecode is "0A";
> in UTF-16, LF should be "000A" (Big Endian)  or "0A00" (Little
> Endian), but my perl seems to regard single "0A" as LF too! Thus, she
> will do the wrong thing, which is adding a "0D" before "0A" (my perl
> also regard "0D" as CR, the right CR in UTF-16 should be "000A").

I believe this is a known problem with the way the default :crlf layer
works on Win32.

Since the layer is immediately on top of the :unix layer, it's working
at a byte level, adding CRs to the bytes *after* translation from
characters.

This means you get other broken behaviour, such as inserting a 0d byte
before characters in the U+AXX range:

C:\Users\tony>perl -e "open my $fh, '>:encoding(utf16be)', 'foo.txt' or
die $!;
print $fh qq(\x{a90}hello\n)"

C:\Users\tony>perl -e "binmode STDIN; $/ = \16; while (<>) { print
unpack('H*',
$_), qq'\n' }" <foo.txt
0d0a9000680065006c006c006f000d0a

The workaround (or perhaps the only real fix) is to clear the :crlf
layer and add it back on above your unicode layer:

C:\Users\tony>perl -e "open my $fh, '>:raw:encoding(utf16be):crlf',
'foo.txt' or
 die $!; print $fh qq(\x{a90}hello\n)"

C:\Users\tony>perl -e "binmode STDIN; $/ = \16; while (<>) { print
unpack('H*',
$_), qq'\n' }" <foo.txt
0a9000680065006c006c006f000d000a

The only way I can see to fix this would be to make :crlf special, so it
always remains on top, but I suspect that's going to be fairly ugly from
an implementation point of view - do we make other layers special too?

(/me avoids going wild with speculation)

Tony

---
via perlbug:  queue: perl5 status: new
https://rt.perl.org:443/rt3/Ticket/Display.html?id=80058

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About