develooper Front page | perl.perl5.porters | Postings from August 2011

Re: BOMs as noncharacters

Thread Previous | Thread Next
From:
Zefram
Date:
August 23, 2011 05:11
Subject:
Re: BOMs as noncharacters
Message ID:
20110823121130.GB14463@lake.fysh.org
Tom Christiansen wrote:
>Do I misunderstand strict UTF-8?  I thought U+FFFE was a noncharacter guaranteed
>not to occur in a conformant UTF-8 stream.

It's syntactically valid to encode the codepoint 0xfffe in UTF-8.
However, U+fffe is a non-character, and so is forbidden in interchange
of textual data.  We discussed this issue a couple of months ago,
in the context of PerlIO layers.  We figured it was best to separate
the encoding itself (UTF-8), and strictness regarding the encoding
syntax, from strictness regarding which codepoints are allowed through.
I haven't noticed anyone implementing the new layers that were discussed,
but there was some cleanup in the existing utf8 warnings.

-zefram

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About