karl williamson wrote: > demerphq wrote: > [snip] >>>> >>>> I personally consider character class notation to be an abbreviation >>>> of alternation. So a character class [xyz] is supposed to match the >>>> same thing as (x|y|z). This implies that character classes have to be >>>> able to match more than one character under case-folding rules. A lot >>>> of external logic and at least some internal logic operates under this >>>> assumption, so i dont think we can change it. >>>> >>> That sounds right. >> >> Im trying to imagine a way to do this that doesn't involve a pretty >> considerable redesign of how character classes work, and not coming up >> with much. >> >> Yves >> > > I've only time right now to address this last point in your response. > I'll look at the rest later. > > What I know is that regcomp.c attempts to handle some of this. Here is > a little of it starting at line 8324: > /* Any multicharacter foldings > * require the following transform: > * [ABCDEF] -> (?:[ABCabcDEFd]|pq|rst) > * where E folds into "pq" and F folds > * into "rst", all other characters > * fold to single characters. We save > * away these multicharacter foldings, > * to be later saved as part of the > * additional "s" data. */ > SV *sv; > > if (!unicode_alternate) > unicode_alternate = newAV(); > sv = newSVpvn_utf8((char*)foldbuf, foldlen, > TRUE); > av_push(unicode_alternate, sv); > > But it's not working. I never found the time to pursue it. But perhaps > you meant that it doesn't handle things like ß =~ /s{2}/ > > And, another idea that might be helpful. I looked up the discussion in this list's archives about tricky folds, and someone suggested an idea that I also had been thinking of independently, and it didn't look like there was any response to his idea. And that was in effect to instead of using trickyfold, to pretend for the tricky fold characters that the input was a mapping of them. For ß, for example, pretend it was (?:ß|[Ss][Ss]|\x{1e9e}). Then the optimizer wouldn't have to be fooled.Thread Previous | Thread Next