On 09/07/2011 07:07 PM, Karl Williamson wrote:
> On 09/07/2011 09:30 AM, Tom Christiansen wrote:
>> "Rafael Garcia-Suarez via RT"<perlbug-followup@perl.org> wrote
>> on Tue, 06 Sep 2011 14:40:00 PDT:
>>
>>> On 6 September 2011 17:29, tchrist1 wrote:
>>>> Summary: If you use -E, matches fail that work fine under -e. This is
>>>> in some sense the opposite of the Unicode bug, which normally
>>>> works the other way around.
>>>>
>>
>>> That would be because -E turns on all current features, including
>>> "unicode_strings"
>>
>> Yes, I realize that. Normally, "The Unicode Bug" is cleared up by
>> enabling the unicode_strings feature, whereas here doing so triggers it:
>>
>>> ~§ perl -Mcharnames=:full -Mfeature=unicode_strings -le 'print
>>> "s\N{LATIN SMALL LIGATURE LONG S T}" =~ /sst/i ? "Pass" : "Fail"'
>>> Fail
>>
>>> ~§ perl -Mcharnames=:full -lE 'no feature "unicode_strings";print
>>> "s\N{LATIN SMALL LIGATURE LONG S T}" =~ /sst/i ? "Pass" : "Fail"'
>>> Pass
>>
>>> Now, I would let Karl comment on whether this is a bug in
>>> unicode_strings, or not...
>>
>> I can't really see how it wouldn't be.
>>
>> --tom
>>
>
> It's not a bug in unicode_strings, as that is irrelevant here, since the
> string is in utf8. It is a bug in regcomp.c; and it is from my
> overlooking the situation specified in the bug report. I am thinking
> about solutions; probably a static analysis done under control of regen
> to look for cases where the tail of a multichar fold can be the head of
> another, and then have regcomp look for those and substitute in an
> appropriate pattern. In the case of sst it would be something like
> (?i:sst|\x{df}t|s\x{fb05}) plus whatever else the analysis for this
> situation calls for, and the /i matching in the result is restricted to
> single char folds, so e.g., \x{df} will match its capital, but not
> expand out to 'ss' again.
>
> This still won't handle the cases like (ss)(t), etc.
>
I should do more baking before I publicize my partially-baked ideas. I
thought some more about this while swimming, and I think the solution is
quite different than I present here, but will need some more baking to
be sure.
Thread Previous
|
Thread Next