develooper Front page | perl.perl5.porters | Postings from May 2003

Re: another attempt at adding unicode regex support to perl

Thread Previous | Thread Next
Jeff 'japhy' Pinyan
May 28, 2003 23:44
Re: another attempt at adding unicode regex support to perl
Message ID:
On May 27, Jeff 'japhy' Pinyan said:

>I've been trying to add Unicode's regex charclass-magic to Perl (again),
>and I think I've run into the same problem as before.

I've hit a snag.  If a charclass has intersection or subtraction in it,
and locale is on or \p{...} classes are used, the charclass must be
represented ENTIRELY as those "+utf8::XXX" strings.  Here's why.

If locale is on, then a charclass like [[\w&&[\d]][aeiou]] will have to be
represented as "+utf8::IsAlnum &utf::IsDigit +utf8::Is_aeiou_" (or
something like that), because (since locale is on) \w doesn't modify the
charclass's bitmap array, but just turns on the ANYOF_ALNUM flag.  Since
precedence is an issue, we can't just check flags.

This will mean we'll be suffering some inefficiency (but that should be
expected with Unicode right now, right?).  It also means I need to come up
with on-the-fly Unicode classes that match a specific set of characters I
decide on at that moment.  What's the easiest way to do that?  I need to
know this to get intersection and subtraction working.

Jeff "japhy" Pinyan
RPI Acacia brother #734
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.
[  I'm looking for programming work.  If you like my work, let me know.  ]

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About