develooper Front page | perl.perl6.internals | Postings from June 2001

Re: More character matching bits

Thread Previous | Thread Next
From:
Buddha Buck
Date:
June 11, 2001 13:43
Subject:
Re: More character matching bits
Message ID:
5.0.2.1.0.20010611163842.00be88a0@128.205.32.3
At 01:14 PM 06-11-2001 -0700, Russ Allbery wrote:
>Dan Sugalski <dan@sidhe.org> writes:
> > At 01:05 PM 6/11/2001 -0700, Russ Allbery wrote:
> >> Dan Sugalski <dan@sidhe.org> writes:
>
> >>> Should perl's regexes and other character comparison bits have an
> >>> option to consider different characters for the same thing as
> >>> identical beasts?  I'm thinking in particular of the Katakana/Hiragana
> >>> bits of japanese, but other languages may have the same concepts.
>
> >> I think canonicalization gets you that if that's what you want.
>
> > I don't think canonicalization should do this. (I really hope not) This
> > isn't really a canonicalization matter--words written with one character
> > set aren't (AFAIK) the same as words written with the other, and which
> > alphabet you use matters. (Which sort of argues against being able to do
> > this, I suppose...)
>
>I guess I don't know what the definition of "the same thing" you're using
>here is.

I thought Dan was talking about something equivalent to the m//i functionality.

Would it, or should it, be possible to tell m// to treat Katakana 
characters as the same as hiragana characters, in much the same way as m//i 
treats UPPERCASE the same as lowercase?  Canonicalization won't get you that.

My feeling is that the hooks should be there, but the specific equivalence 
mappings should be in the library, not the core.



Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About