develooper Front page | perl.perl5.porters | Postings from May 2003

Re: another attempt at adding unicode regex support to perl

Thread Previous | Thread Next
From:
Jeff 'japhy' Pinyan
Date:
May 28, 2003 23:44
Subject:
Re: another attempt at adding unicode regex support to perl
Message ID:
Pine.LNX.4.44.0305290219240.2009-100000@perlmonk.org
On May 27, Jeff 'japhy' Pinyan said:

>I've been trying to add Unicode's regex charclass-magic to Perl (again),
>and I think I've run into the same problem as before.

I've hit a snag.  If a charclass has intersection or subtraction in it,
and locale is on or \p{...} classes are used, the charclass must be
represented ENTIRELY as those "+utf8::XXX" strings.  Here's why.

If locale is on, then a charclass like [[\w&&[\d]][aeiou]] will have to be
represented as "+utf8::IsAlnum &utf::IsDigit +utf8::Is_aeiou_" (or
something like that), because (since locale is on) \w doesn't modify the
charclass's bitmap array, but just turns on the ANYOF_ALNUM flag.  Since
precedence is an issue, we can't just check flags.

This will mean we'll be suffering some inefficiency (but that should be
expected with Unicode right now, right?).  It also means I need to come up
with on-the-fly Unicode classes that match a specific set of characters I
decide on at that moment.  What's the easiest way to do that?  I need to
know this to get intersection and subtraction working.

-- 
Jeff "japhy" Pinyan      japhy@pobox.com      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About