Zefram wrote: > karl williamson wrote: >> I believe this gives the orthogonality that xdg wants; > > It's getting closer. It would help if you described the various :foo_utf8 > layers in terms of equivalent pairs of encoding and strictness layers. > > I'd like to see a strict distinction between standard UTF-8 and Perl's > internal extended UTF-8. This is not a matter for the strictness axis, > it's better treated as an encoding matter. Your discussion for :safe_utf8 > suggests that you're not entirely clear about it. Standard UTF-8 can > represent any codepoint up to 31 bits, and never uses 0xfe or 0xff octets > in the encoded form. Perl's extended UTF-8 is extended precisely in > using 0xfe and 0xff octets to extend the range up to 72 bits. If I ask > for standard UTF-8 decoding, any 0xfe or 0xff on the input must be an > error, no matter how permissive I am about which characters I'll accept. I think we are using the term "standard UTF-8" differently. I'm using it according to the Unicode standard's definition. UTF-8 for them does not include anything above code point 0x10FFFF, nor surrogates. The non-characters are also not allowed in UTF-8 in open interchange. By the Unicode definition, Perl's extended UTF-8 is not just going beyond 31 bits. I am of the opinion that we should use the standard's definition of standard utf8 in our documentation, as that is what the rest of the world will be thinking we mean. I think that we need to resolve what we mean by standard UTF-8 before deciding further on the various layers.Thread Previous | Thread Next