On 04/28/2011 06:32 PM, Tom Christiansen wrote:
>> Wouldn't backing out multichar folds for 5.14 introduce a regression?
>
> Specifically, it would break things like this, which already worked:
>
> % perl5.12.0 -E 'say "\x{FB00}" =~ /ff/i || 0'
> 1
> ...
> % perl5.12.3 -E 'say "\x{FB00}" =~ /ff/i || 0'
> 1
>
> --tom
>
Yes it would. My point was that appears to be where Unicode is headed.
But there are no guarantees that that is where they'll end up.
A middle position would be to disable them only in bracketed character
classes. I think that the most astonishment stems from those, when they
are inverted. This is where it was most buggy pre-5.14. There were
cases where it worked; but mostly it didn't. And most of the cases
where it worked were when the class got optimized into an EXACTF node.
We'd have to worry about what to do with that situation now. My
position would be that we wouldn't do that optimization if the result
would match multiple characters.
To state more clearly, I guess I'm now putting forth the idea that the
least worst case for 5.14 is that we say that a bracketed character
class can only match a single input character. Most people expect that
anyway, and it would have the fewest regressions. Almost all
regressions would be of the form that /[ß]/i would no longer mean the
same thing as /ß/i.
The idea scares me of allowing a non-inverted class match multiple char
folds vs an inverted one
Thread Previous
|
Thread Next