demerphq wrote: [snip] >>> >>> I personally consider character class notation to be an abbreviation >>> of alternation. So a character class [xyz] is supposed to match the >>> same thing as (x|y|z). This implies that character classes have to be >>> able to match more than one character under case-folding rules. A lot >>> of external logic and at least some internal logic operates under this >>> assumption, so i dont think we can change it. >>> >> That sounds right. > > Im trying to imagine a way to do this that doesn't involve a pretty > considerable redesign of how character classes work, and not coming up > with much. > > Yves > I've only time right now to address this last point in your response. I'll look at the rest later. What I know is that regcomp.c attempts to handle some of this. Here is a little of it starting at line 8324: /* Any multicharacter foldings * require the following transform: * [ABCDEF] -> (?:[ABCabcDEFd]|pq|rst) * where E folds into "pq" and F folds * into "rst", all other characters * fold to single characters. We save * away these multicharacter foldings, * to be later saved as part of the * additional "s" data. */ SV *sv; if (!unicode_alternate) unicode_alternate = newAV(); sv = newSVpvn_utf8((char*)foldbuf, foldlen, TRUE); av_push(unicode_alternate, sv); But it's not working. I never found the time to pursue it. But perhaps you meant that it doesn't handle things like ß =~ /s{2}/Thread Previous | Thread Next