develooper Front page | perl.perl5.porters | Postings from August 2011

Re: BOMs as noncharacters

Thread Previous | Thread Next
August 23, 2011 05:11
Re: BOMs as noncharacters
Message ID:
Tom Christiansen wrote:
>Do I misunderstand strict UTF-8?  I thought U+FFFE was a noncharacter guaranteed
>not to occur in a conformant UTF-8 stream.

It's syntactically valid to encode the codepoint 0xfffe in UTF-8.
However, U+fffe is a non-character, and so is forbidden in interchange
of textual data.  We discussed this issue a couple of months ago,
in the context of PerlIO layers.  We figured it was best to separate
the encoding itself (UTF-8), and strictness regarding the encoding
syntax, from strictness regarding which codepoints are allowed through.
I haven't noticed anyone implementing the new layers that were discussed,
but there was some cleanup in the existing utf8 warnings.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About