Dan Sugalski <dan@sidhe.org> writes: > Should perl's regexes and other character comparison bits have an option > to consider different characters for the same thing as identical beasts? > I'm thinking in particular of the Katakana/Hiragana bits of japanese, > but other languages may have the same concepts. I think canonicalization gets you that if that's what you want. I definitely think that Perl should be able to do all of NFD, NFC, NFKD, and NFKC canonicalization. NFC will collapse most different characters for the same thing to a single character and get rid of most of the compatibility characters for you. NFKC will go further and do stuff like getting rid of superscripts and the like. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>Thread Previous | Thread Next