Larry Wall <larry@wall.org> writes: > Russ Allbery writes: >> Particularly since extending UTF-8 to more than 31 bits requires >> breaking some of the guarantees that UTF-8 makes, unless I'm missing >> how you're encoding the first byte so as not to give it a value of >> 0xFE. > The UTF-16 BOMs, 0xFEFF and 0xFFFE, both turn out to be illegal UTF-8 in > any case, so it doesn't much matter, assuming BOMs are used on UTF-16 > that has to be auto-distinguished from UTF-8. (Doing any kind of > auto-recognition on 16-bit data without BOMs is problematic in any > case.) Yeah, but one of the guarantees of UTF-8 is: - The octet values FE and FF never appear. I can see that this property may not be that important, but it makes me feel like things that don't have this property aren't really UTF-8. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>Thread Previous | Thread Next