On May 27, Jeff 'japhy' Pinyan said: >I've been trying to add Unicode's regex charclass-magic to Perl (again), >and I think I've run into the same problem as before. I've hit a snag. If a charclass has intersection or subtraction in it, and locale is on or \p{...} classes are used, the charclass must be represented ENTIRELY as those "+utf8::XXX" strings. Here's why. If locale is on, then a charclass like [[\w&&[\d]][aeiou]] will have to be represented as "+utf8::IsAlnum &utf::IsDigit +utf8::Is_aeiou_" (or something like that), because (since locale is on) \w doesn't modify the charclass's bitmap array, but just turns on the ANYOF_ALNUM flag. Since precedence is an issue, we can't just check flags. This will mean we'll be suffering some inefficiency (but that should be expected with Unicode right now, right?). It also means I need to come up with on-the-fly Unicode classes that match a specific set of characters I decide on at that moment. What's the easiest way to do that? I need to know this to get intersection and subtraction working. -- Jeff "japhy" Pinyan japhy@pobox.com http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course. [ I'm looking for programming work. If you like my work, let me know. ]Thread Previous | Thread Next