develooper Front page | perl.perl5.porters | Postings from February 2008

Re: use encoding 'utf8' bug for Latin-1 range

Thread Previous | Thread Next
Glenn Linderman
February 28, 2008 14:21
Re: use encoding 'utf8' bug for Latin-1 range
Message ID:
On approximately 2/28/2008 1:12 PM, came the following characters from 
the keyboard of Nicholas Clark:
> On Thu, Feb 28, 2008 at 08:34:12PM +0100, Tels wrote:

>> In any event, I don't see why "use utf-8" shouldn't die when the source 
>> contains non-utf-8. After all, you just told Perl it does ;)
> I would have liked it if it did. But it already seems that we have it the
> wrong way, and I'd prefer to deprecate the wrongness, than change it again.
> Nicholas Clark

I think Tels made a typo... but what he said "use utf-8;" currently 
produces "Can't locate"  Of course, I doubt he meant to pass 
negative 8 as a parameter to the module...

But maybe, with UTF-8 and UTF-16 both existing, a well-written 'use utf" 
module with parameters of -8 and -16 wouldn't be the worst idea in the 
world!  Of course, UTF-16 must be detected via BOM, heuristics, or 
environment, as it isn't compatible with UTF-8 or ASCII.

Anyway, "use utf-8" currently doesn't mean anything, so it could be made 
mean "die if source doesn't contain UTF-8 encoded number sequences".

There is also the issue of if codepoints not defined by Unicode, but 
properly encoded in the UTF-8 encoding scheme should be accepted (as 
they are now).  I think they should be, to allow easier handling of 
future codepoints, although operators that understand and apply Unicode 
semantics will only apply the semantics they understand from the version 
of the standard that they understand.

Glenn --
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About