On Mon, Dec 20, 2010 at 1:13 PM, demerphq <demerphq@gmail.com> wrote: > And the point is that certain codepoints are illegal in general, and > so can be treated as essentially maping to themselves. Others are legal ONLY in UTF16, You are calling both the encoded form and the decoded form "code point", and you are using them interchangeably. I can't respond to your post if you don't clear that up. So for an example. Consider we have the codepoint U+10400 which case > folds to U+10428. When represented in UTF-16 the codepoint U+10400 > ends up as the surrogate pair U+D801,U+DC00. Now, if somebody naively > converts this UTF-16 sequence to UTF8 by converting code point by > codepoint, the end result will be that OUR code does NOT see codepoint > U+10400, No, the result will be a warning when you decode the bad UTF-8 you produced this way. We all agree the decoder should warn (with an option to disable) when it sees invalid UTF-8 or Unicode. - EricThread Previous | Thread Next