karl williamson wrote: >I think we are using the term "standard UTF-8" differently. I'm using it to refer to UTF-8 as originally specified. It corresponds to the older concept of ISO-10646 as a 31-bit charset, back when Unicode was only 16-bit. Later, when Unicode realised that 16 bits wasn't enough, they invented UTF-16 and compromised on the 20.09 bits that UTF-16 reaches. So nowadays some statements of the UTF-8 encoding describe only how to apply it to the 20.09-bit range, which is what you're picking up on. I view the 20.09-bit limit as a feature of Unicode, not of the encoding. -zeframThread Previous