On 08/23/2011 06:11 AM, Zefram wrote: > Tom Christiansen wrote: >> Do I misunderstand strict UTF-8? I thought U+FFFE was a noncharacter guaranteed >> not to occur in a conformant UTF-8 stream. > > It's syntactically valid to encode the codepoint 0xfffe in UTF-8. > However, U+fffe is a non-character, and so is forbidden in interchange > of textual data. I feel the need to make what I consider to be a crucial clarification here: "They are forbidden in OPEN interchange of textual data". A private agreement to interchange these overrides the general rule, and is why these aren't always forbidden. We discussed this issue a couple of months ago, > in the context of PerlIO layers. We figured it was best to separate > the encoding itself (UTF-8), and strictness regarding the encoding > syntax, from strictness regarding which codepoints are allowed through. > I haven't noticed anyone implementing the new layers that were discussed, > but there was some cleanup in the existing utf8 warnings. I've waiting for Nicholas to finish his scheme to create a fast table driven byte validation layer.Thread Previous | Thread Next