develooper Front page | perl.i18n | Postings from February 2006

Using :encoding and :crlf together?

Thread Next
Ciaran Hamilton
February 6, 2006 05:12
Using :encoding and :crlf together?
Message ID:

I'm new to this list, and I've tried searching the archives but I 
couldn't find anything like this. I'm using Perl v5.8.7, and I'm 
currently tearing my hair out trying to get the :encoding and :crlf 
layers to play nicely with each other.

My problem is that I'm developing a system which, as part of its job, 
needs to be able to read and write files in most encodings. I'm using 
:encoding for this - so far, so good.

For readability and compatibility reasons, these files should have CR/LF 
line endings, although this problem is equally applicable with or 
without them. So, I figure the :crlf layer works for this.

Unfortunately, trying to get :crlf and :encoding to do the Right Thing 
with each other seems to be like trying to pull hens' teeth. Here's an 
example of what I was doing at first:

open(FILE, ">:crlf:encoding(UTF-8)", "some-file.txt");

All seemed to work fine, except until I tested outputting as UTF-16 
instead of UTF-8 - at which point I discovered that the encoding layer 
wasn't encoding the inserted CRs, and thus screwing up the UTF-16 file. 
D'oh! Okay, so swap the layers:

open(FILE, ">:encoding(UTF-16):crlf", "some-file.txt");

Seems like everything should work there, but now I get problems trying 
to print some characters. For example, trying to print a \x{A3} (a 
British pound sign) results in:

"Malformed UTF-8 character (unexpected continuation byte 0xa3, with no 
preceding start byte) in null operation at ./ line 6."

...and the output file contains a null character where the sign should 
be. Strangely, using a literal £ UTF-8 sequence (ie. C2 A3) in the Perl 
file works fine. Here's the file that generates the above error:


open(FILE, ">:encoding(UTF-16):crlf", "test");
print FILE "Test \x{A3}45!\n";
print FILE "Test!\n";

Yes, line 6 is the close() line. Removing :crlf from the layers fixes 
the problem, so I'm wondering if this is a bug in the implementation of 
:crlf. I'd really like to have some sort of transparent CR/LF conversion 
though, as it makes things a lot easier.

Is this a known problem?

  - Ciaran.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About