Front page | perl.perl5.porters |
Postings from August 2010
Re: RFC: New regex modifier flags; also the whimsical nature of backwardcompatibility; new 'r' flag has issues
Thread Previous
|
Thread Next
From:
karl williamson
Date:
August 6, 2010 11:01
Subject:
Re: RFC: New regex modifier flags; also the whimsical nature of backwardcompatibility; new 'r' flag has issues
Message ID:
4C5C4DCE.1080702@khwilliamson.com
H.Merijn Brand wrote:
> On Fri, 6 Aug 2010 08:14:02 -0400, David Golden <xdaveg@gmail.com>
> wrote:
>
>> On Fri, Aug 6, 2010 at 7:36 AM, karl williamson <public@khwilliamson.com> wrote:
>>> I did an analysis of this, and it turns out that the only ambiguous case is
>>> 's/foo/bar/le'. It seems like overkill for this to invent a new temporary
>>> pragma, and forbid all the new modifiers as suffixes, when there is no
>>> ambiguity at all outside of substitutions, and no ambiguity using
>>> substitutions except for one combination out of all those possible. Why
>>> can't we just say in the pods and warning message that '/le' must be
>>> written as '/el' in 5.14?
>> Help me understand what you mean by ambiguous. If there is really only
>> one case, then great!
>>
>> But hypothetically, what would s/foo/bar/elt1 do? Would the "l" parse
>> as a modifier or would it parse as bar of "lt"?
>>
>> Here's a stupid, but legal example:
>>
>> $ perl -wE '$_=<>; sub bar { "bar" }; if ( s/foo/bar/elt 1 ) { say
>> "not done" }'
>
> And mind you that some module might add those flags dynamically. I know
> abigail does some funky stuff, but I bet others do to. Then an eval of
> code with a generated regex where the l got inserted just before the e
> will suddenly fail.
>
> FWIW I have no strong opinions here, just pointing to possible places
> of hurt.
>
I'm not sure where to start.
So I'll start here: My waking-up-in-the-middle-of-the-night analysis
was somewhat flawed, (latest included at the end for you to check).
First, when I said 'le' was the only possible conflict, I should have
said any combination that has 'le' in it, such as '/gle'. Actually, I
think, any combination that ends in 'le', so '/glex' isn't ambiguous.
Anyway, that really doesn't change the original claim.
And the new analysis shows one additional issue I had overlooked if we
use the 't' modifier, that issue being 'gt'. If we switch to using 'd'
instead, as I'd already been leaning towards, it goes back down to the
single problem ('le') that I identified earlier.
However, the new analysis shows two problems with the recently added 'r'
modifier: '/or' and '/xor'. Hence I've retitled the subject of this
post to include Yves' earlier comment on the whimsical nature of finding
these backward compatibility issues. 'r' was added with nary a peep,
IIRC, about such things. There is a .t patch in the queue somewhere,
BTW, which if it had ever been applied, I think would have found these.
The reason there are so few of the charset modifier issues is because we
decided that if there were more than one mutually exclusive flag, it
would be a syntax error. Thus 'lt' in David's example is not ambiguous.
It has to mean that the 'lt' is the less-than operator, because
otherwise leads to a syntax error.
I had come up with preferring 'd' as the modifier meaning the
traditional behavior instead of 't', because I think it succinctly
describes what is happening: the character set used is like a
dual-valued variable. It can be native sometimes, and unicode other
times. (I personally think this behavior is crazy.) 'd' really lays it
out what this means; whereas 'traditional' is sort of hazy. I bet there
are readers of this list who don't realize what is going on, and I,
who've looked at the code extensively, still get surprised. And using
'd', reduces the compatibility problems in regard to the charset flags
to one, assuming my light-of-day analysis is correct.
The analysis is that I wrote the attached program and ran it on all the
keywords in the DATA section of keywords.pl. It finds all keywords that
consist only of characters that are regex modifiers. My claim is that a
new ambiguity exists only if the addition of any legal combination of
the new modifiers spells a keyword that isn't already spelled, and if
that keyword can occur without a syntax error in the context of
immediately following a regex. It prints out the various words spelled.
I then eyeballed the output looking for ones that I thought legally
could follow a regex. Perhaps people more familiar with the nuances of
Perl will see more. The output follows (note that cmp, ge, and x would
be forbidden under our new policy):
Pre-existing potential conflicts: cmp, cos, exec, exp, ge, m, pipe, pop,
pos, s , semop, x
New potential conflicts: close, else, grep, lc, le, log, or, our,
sleep, splice , uc, use, xor
New potential conflicts with /t: exists, exit, getc, getpgrp, gmtime,
goto, gt, msgget, oct, reset, semget, setpgrp, sort, tie, time, times, tr
New potential conflicts with /d: die, do, ord, redo, rmdir
Thread Previous
|
Thread Next