karl williamson wrote: >I believe this gives the orthogonality that xdg wants; It's getting closer. It would help if you described the various :foo_utf8 layers in terms of equivalent pairs of encoding and strictness layers. I'd like to see a strict distinction between standard UTF-8 and Perl's internal extended UTF-8. This is not a matter for the strictness axis, it's better treated as an encoding matter. Your discussion for :safe_utf8 suggests that you're not entirely clear about it. Standard UTF-8 can represent any codepoint up to 31 bits, and never uses 0xfe or 0xff octets in the encoded form. Perl's extended UTF-8 is extended precisely in using 0xfe and 0xff octets to extend the range up to 72 bits. If I ask for standard UTF-8 decoding, any 0xfe or 0xff on the input must be an error, no matter how permissive I am about which characters I'll accept. So, suppose we have :encoding(UTF-8) for standard UTF-8, and :encoding(utf8) for Perl's extended UTF-8. (I'd really like to deprecate the latter name, if possible.) And on the strictness axis suppose we have :no_surrogates, :no_above_unicode, and :no_nonchars, as you describe. I think your :foo_utf8 layers then are defined thus: :strict_utf8 == :encoding(UTF-8) :no_surrogates :no_above_unicode :no_nonchars :safe_utf8 == :encoding(UTF-8) :no_surrogates :no_nonchars :unsafe_utf8 == :encoding(utf8) Obviously, by taking advantage of the orthogonality there are many other :utf8-like layer combinations that could be named. I don't have very strong opinions about which ones ought to have short names, other than that the easiest to use, :utf8, ought to be quite strict. I don't think :encoding(utf8) with no strictures (your :unsafe_utf8) deserves a short name. The combination of the three stricture layers ought to have a short name, possibly ":strict_unicode". -zeframThread Previous | Thread Next