Russ Allbery <rra@Stanford.EDU> writes: > That's probably unnecessary; I really don't expect them to ever use all > 31 bytes that the IETF-standardized version of UTF-8 supports. 31 bits, rather. *sigh* But given that, modulo some debate over CJKV, we're getting into *really* obscure stuff already at only 94,140 characters, I'm guessing that there would have to be some really major and fundamental changes in written human communication before something more than two billion characters are used. Which doesn't mean rule out the possibility of ever expanding, since one should always leave that option open, but expending coding effort on it isn't worth it. Particularly since extending UTF-8 to more than 31 bits requires breaking some of the guarantees that UTF-8 makes, unless I'm missing how you're encoding the first byte so as not to give it a value of 0xFE. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>Thread Previous | Thread Next