develooper Front page | perl.perl5.porters | Postings from November 2000

Re: [ID 20001114.001] use utf8;use charnames; is incorrect for \x{80}-\x{FF}

Thread Previous | Thread Next
Nick Ing-Simmons
November 14, 2000 13:40
Re: [ID 20001114.001] use utf8;use charnames; is incorrect for \x{80}-\x{FF}
Message ID:
Andrew McNaughton <> writes:
>I probably don't care how perl represents the string internally, so long
>as it knows how it reprents it, and I don't have to think about it.

That is the goal.

>perl's pragmas allow you to specify what a string is (char sequence or
>byte sequence) at the time it is created.  

No they don't. They are always char sequences. 

"bytes" sequences are just sequences of chars in range 0..255.
Making sure that is the case is "your" problem to some extent. 

The utf8 pragma tells perl
that the file it is reading has the chars encoded as utf8, otherwise it 
backward-friendly assumes iso8859-1. 

>Then perl apparently forgets.

It should not do that - though 5.6.0 _may_ as it has far more bugs in this
area than current sources.

BUT: Please note that 'use bytes' is an _explicit_ instruction to perl to 
"forget what you know about these chars". 

>I do care if the internal encoding is exposed and called utf-8 but is not
>utf-8, or the differences are undocumented.  

So do NOT "use bytes" - that says "expose the internal encoding" and 
the internal encoding is not yet reliably documented.

>I care about having a readily availables function to get strings into a
>portable format.  If perl considers its character format to be an internal
>matter, then print has to deal with it.

Specifying the encoding etc. for print is work in progress. 
Until that is in place perl5.6+ is only "marketing compatible" 
with UNICODE/utf8...

Nick Ing-Simmons

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About