develooper Front page | perl.perl5.porters | Postings from September 2009

Perl's intolerance of U+FFFE, was Re: [perl #69414] Case-insensitiveutf8 matching problem

Thread Previous | Thread Next
From:
karl williamson
Date:
September 30, 2009 10:17
Subject:
Perl's intolerance of U+FFFE, was Re: [perl #69414] Case-insensitiveutf8 matching problem
Message ID:
4AC39259.3080607@khwilliamson.com
I don't, unfortunately, have anything useful to say about Tom's post. 
But can someone explain to me why Perl forbids FFFE?

I am guessing that it is from a misreading of the standard.  This is one 
of the  66 "noncharacters" in the standard, defined thusly 
(http://www.unicode.org/versions/Unicode5.0.0/b1.pdf):

Noncharacter. A code point that is permanently reserved for internal use 
and that should
never be interchanged. Noncharacters consist of the values U+nFFFE and 
U+nFFFF
(where n is from 0 to 10 base 16), and the values U+FDD0..U+FDEF.

When it means internal use, it means in a given program it is fine to 
use, but it should not be written in data intended to be read by another 
program.  Thus, a legitimate use for them would be in-band temporary 
markers since data should never contain them.

But Perl doesn't let you.  Is there some real reason for that, or did 
the implementors just think that they were illegal, period?

There is an open bug asking that Perl allow them, dating from 2006
http://rt.perl.org/rt3/Public/Bug/Display.html?id=38293

Someone submitted a patch that did that

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About