develooper Front page | perl.perl5.porters | Postings from August 2011

Re: BOMs as noncharacters

Thread Previous | Thread Next
From:
Karl Williamson
Date:
August 27, 2011 14:58
Subject:
Re: BOMs as noncharacters
Message ID:
4E596864.2050702@khwilliamson.com
On 08/23/2011 06:11 AM, Zefram wrote:
> Tom Christiansen wrote:
>> Do I misunderstand strict UTF-8?  I thought U+FFFE was a noncharacter guaranteed
>> not to occur in a conformant UTF-8 stream.
>
> It's syntactically valid to encode the codepoint 0xfffe in UTF-8.
> However, U+fffe is a non-character, and so is forbidden in interchange
> of textual data.

I feel the need to make what I consider to be a crucial clarification 
here:  "They are forbidden in OPEN interchange of textual data".  A 
private agreement to interchange these overrides the general rule, and 
is why these aren't always forbidden.

We discussed this issue a couple of months ago,
> in the context of PerlIO layers.  We figured it was best to separate
> the encoding itself (UTF-8), and strictness regarding the encoding
> syntax, from strictness regarding which codepoints are allowed through.
> I haven't noticed anyone implementing the new layers that were discussed,
> but there was some cleanup in the existing utf8 warnings.

I've waiting for Nicholas to finish his scheme to create a fast table 
driven byte validation layer.



Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About