develooper Front page | perl.perl5.porters | Postings from November 2000

Re: [ID 20001114.001] use utf8;use charnames; is incorrect for\x{80}-\x{FF}

Thread Previous | Thread Next
From:
Andrew McNaughton
Date:
November 14, 2000 10:59
Subject:
Re: [ID 20001114.001] use utf8;use charnames; is incorrect for\x{80}-\x{FF}
Message ID:
Pine.BSF.4.10.10011150621250.7938-100000@sub.internal.cwa.co.nz
On Tue, 14 Nov 2000, Simon Cozens wrote:

> Date: Tue, 14 Nov 2000 17:20:20 +0000
> From: Simon Cozens <simon@cozens.net>
> To: perl5-porters@perl.org
> Subject: Re: [ID 20001114.001] use utf8;use charnames; is incorrect for
    \x{80}-\x{FF}
> 
> On Wed, Nov 15, 2000 at 03:06:19AM +1300, Andrew McNaughton wrote:
> > On Tue, 14 Nov 2000, Nick Ing-Simmons wrote:
> > > Andrew McNaughton <andrew@tki.org.nz> writes:
> > > >use utf8;
> > > >use charnames ':full';
> > > >$text .= "\N{LATIN CAPITAL LETTER A WITH DIAERESIS}";
> > > >
> > > >
> > > >This fails because of the final line of &charnames::charnames.  It returns an
> > > >8 bit value.
> > > 
> > > It is an 8-bit value - that is the UNICODE codepoint is < 256.
> > 
> > The unicode codepoint may be less than 256, but in utf8 2 byte characters
> > start from codepoint 128, not 256.
> 
> Why do you think that Perl should encode this in UTF8?




I probably don't care how perl represents the string internally, so long
as it knows how it reprents it, and I don't have to think about it.

perl's pragmas allow you to specify what a string is (char sequence or
byte sequence) at the time it is created.  Then perl apparently forgets.

I do care if the internal encoding is exposed and called utf-8 but is not
utf-8, or the differences are undocumented.  

I care about having a readily availables function to get strings into a
portable format.  If perl considers its character format to be an internal
matter, then print has to deal with it.

chr currently produces a perl character, not a utf-8 character, even if
'use utf8' is in action.  I haven't yet found anything to extract utf-8
reliably except for eval statements with messy quoting.  This could be
wrapped up in a module, but text processing is supposed to be a central
function of perl.






--
Andrew McNaughton
Te Kete Ipurangi: The Online Learning Centre
andrew@tki.org.nz
Ph: 64 4 382 6500
Fax: 64 4 382 6509
Mobile: 021 323 076

PO Box 19-098
Wellington, NZ
http://www.tki.org.nz/


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About