On Fri 24.Dec'10 at 9:03:05 -0700, karl williamson wrote: > Based on feedback, here's a revised proposal: This proposal makes sense to me, though I fully acknowledge that I don't understand all the ins and outs of unicode. > No layer allows in syntactically malformed utf8 > > :strict_utf8 allows in only what Unicode says is interchangeable Does this change from version to version of the Unicode standard? If so, we may want to explicitly define :strict_utf8_60 :strict_utf8_50 and so on and explicitly state that :strict_utf8 is always an alias to the most current version of the Unicode standard. If this Never Changes(tm), ignore this suggestion. > :safe_utf8 (or maybe :portable_utf8) allows the above plus > above-unicode code points up to those that begin with 0xfe. It's > said that 0xfe and 0xff can start looking like utf16, although I > don't fully understand the whole thing. If we accepted 0xfe and > not 0xff we still wouldn't ever accept a misconstrued BOM; accepting > 0xfe goes beyond what a U32 can hold, and so is non-portable. > Another possibility is for this option to accept only up to what a > U32 can hold. I tend to shy away from names including the word "safe" as they invariably describe something that's discovered not to be safe. Would it be wrong to describe this one simply as ":utf8"? > :unsafe_utf8 (or :non_portable_utf8) allows in surrogates, > noncharacter code points, and all above-unicode code points that > don't overflow the platform's UV. :unchecked_utf8? (I don't really care as much about the name on this one) > :utf8 is aliased to :safe_utf8. I'm with zefram that the easiest > thing to do should not allow attack possibilities. > > :no_surrogates prohibits surrogates > > :no_above_unicode prohibits above-unicode code points > > :no_nonchars prohibits non-character code points. > > I believe this gives the orthogonality that xdg wants; better name > suggestions welcome Best, JesseThread Previous | Thread Next