On 25/12/2015 03:21, Zefram wrote: > demerphq wrote: >> Is it the one where combining characters are replaced with >> canonical codepoints? > Basically, yes. NFC = Composed: precomposed characters are used > as much as possible. NFD = Decomposed: precomposed characters are > entirely avoided in favour of separate base and combining characters. > There's also NFKC and NFKD, which perform compatibility mappings and > then either Compose or Decompose. > > -zefram The form of normalisation wasn't the main point of me post. Just that I think we should impose normalization of some form on the symbol table to avoid code written in different editors imposing different ideas of how to form a particular character. Take ệ (U+1EC7; LATIN SMALL LETTER E WITH CIRCUMFLEX AND DOT BELOW ) as a case in point, a text editor could take that as one code point or break it down into one of four other combinations of code points. JohnThread Previous