develooper Front page | perl.perl6.internals | Postings from June 2001

Re: More character matching bits

Thread Previous | Thread Next
From:
Russ Allbery
Date:
June 11, 2001 13:05
Subject:
Re: More character matching bits
Message ID:
yl66e2kciw.fsf@windlord.stanford.edu
Dan Sugalski <dan@sidhe.org> writes:

> Should perl's regexes and other character comparison bits have an option
> to consider different characters for the same thing as identical beasts? 
> I'm thinking in particular of the Katakana/Hiragana bits of japanese,
> but other languages may have the same concepts.

I think canonicalization gets you that if that's what you want.  I
definitely think that Perl should be able to do all of NFD, NFC, NFKD, and
NFKC canonicalization.

NFC will collapse most different characters for the same thing to a single
character and get rid of most of the compatibility characters for you.
NFKC will go further and do stuff like getting rid of superscripts and the
like.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About