One thing that may need some consideration now or later is the precomposed versus decomposed characters, that is, "a grave" versus "a" plus "grave". For example when searching for "agrave" you would probably want also "a" plus "grave" to match, and vice versa. Well, you would want that most of the time, anyway. Food for thought: should Perl always make its utf8 data to be in the decomposed form to be canonical? Or, the other way, should it always try to find the composite form (to be more compact)? A canonical form would make searching the data rather easier. Then again, canonizing the data like that would be bad on output: if an incoming "odiaeresis" would become "o" plus "diaeresis" when coming out, some external entity could become confused. -- $jhi++; # http://www.iki.fi/jhi/ # There is this special biologist word we use for 'stable'. # It is 'dead'. -- Jack CohenThread Previous | Thread Next