On Tue, 14 Nov 2000, Simon Cozens wrote: > Date: Tue, 14 Nov 2000 17:20:20 +0000 > From: Simon Cozens <simon@cozens.net> > To: perl5-porters@perl.org > Subject: Re: [ID 20001114.001] use utf8;use charnames; is incorrect for \x{80}-\x{FF} > > On Wed, Nov 15, 2000 at 03:06:19AM +1300, Andrew McNaughton wrote: > > On Tue, 14 Nov 2000, Nick Ing-Simmons wrote: > > > Andrew McNaughton <andrew@tki.org.nz> writes: > > > >use utf8; > > > >use charnames ':full'; > > > >$text .= "\N{LATIN CAPITAL LETTER A WITH DIAERESIS}"; > > > > > > > > > > > >This fails because of the final line of &charnames::charnames. It returns an > > > >8 bit value. > > > > > > It is an 8-bit value - that is the UNICODE codepoint is < 256. > > > > The unicode codepoint may be less than 256, but in utf8 2 byte characters > > start from codepoint 128, not 256. > > Why do you think that Perl should encode this in UTF8? I probably don't care how perl represents the string internally, so long as it knows how it reprents it, and I don't have to think about it. perl's pragmas allow you to specify what a string is (char sequence or byte sequence) at the time it is created. Then perl apparently forgets. I do care if the internal encoding is exposed and called utf-8 but is not utf-8, or the differences are undocumented. I care about having a readily availables function to get strings into a portable format. If perl considers its character format to be an internal matter, then print has to deal with it. chr currently produces a perl character, not a utf-8 character, even if 'use utf8' is in action. I haven't yet found anything to extract utf-8 reliably except for eval statements with messy quoting. This could be wrapped up in a module, but text processing is supposed to be a central function of perl. -- Andrew McNaughton Te Kete Ipurangi: The Online Learning Centre andrew@tki.org.nz Ph: 64 4 382 6500 Fax: 64 4 382 6509 Mobile: 021 323 076 PO Box 19-098 Wellington, NZ http://www.tki.org.nz/Thread Previous | Thread Next