develooper Front page | perl.perl5.porters | Postings from August 2010

Re: RFC: New regex modifier flags; also the whimsical nature of backwardcompatibility; new 'r' flag has issues

Thread Previous | Thread Next
From:
karl williamson
Date:
August 6, 2010 11:07
Subject:
Re: RFC: New regex modifier flags; also the whimsical nature of backwardcompatibility; new 'r' flag has issues
Message ID:
4C5C4F32.4050000@khwilliamson.com
Oops, forgot to attach the program

karl williamson wrote:
> H.Merijn Brand wrote:
>> On Fri, 6 Aug 2010 08:14:02 -0400, David Golden <xdaveg@gmail.com>
>> wrote:
>>
>>> On Fri, Aug 6, 2010 at 7:36 AM, karl williamson 
>>> <public@khwilliamson.com> wrote:
>>>> I did an analysis of this, and it turns out that the only ambiguous 
>>>> case is
>>>> 's/foo/bar/le'.  It seems like overkill for this to invent a new 
>>>> temporary
>>>> pragma, and forbid all the new modifiers as suffixes, when there is no
>>>> ambiguity at all outside of substitutions, and no ambiguity  using
>>>> substitutions except for one  combination out of all those 
>>>> possible.  Why
>>>> can't we just say in the pods and warning message that '/le'  must be
>>>> written as '/el' in 5.14?
>>> Help me understand what you mean by ambiguous. If there is really only
>>> one case, then great!
>>>
>>> But hypothetically, what would s/foo/bar/elt1 do?  Would the "l" parse
>>> as a modifier or would it parse as bar of "lt"?
>>>
>>> Here's a stupid, but legal example:
>>>
>>>   $ perl -wE '$_=<>; sub bar { "bar" }; if ( s/foo/bar/elt 1 ) { say
>>> "not done" }'
>>
>> And mind you that some module might add those flags dynamically. I know
>> abigail does some funky stuff, but I bet others do to. Then an eval of
>> code with a generated regex where the l got inserted just before the e
>> will suddenly fail.
>>
>> FWIW I have no strong opinions here, just pointing to possible places
>> of hurt.
>>
> 
> I'm not sure where to start.
> 
> So I'll start here:  My waking-up-in-the-middle-of-the-night analysis 
> was somewhat flawed, (latest included at the end for you to check).
> 
> First, when I said 'le' was the only possible conflict, I should have 
> said any combination that has 'le' in it, such as '/gle'.  Actually, I 
> think, any combination that ends in 'le', so '/glex' isn't ambiguous. 
> Anyway, that really doesn't change the original claim.
> 
> And the new analysis shows one additional issue I had overlooked if we 
> use the 't' modifier, that issue being 'gt'.  If we switch to using 'd' 
> instead, as I'd already been leaning towards, it goes back down to the 
> single problem ('le') that I identified earlier.
> 
> However, the new analysis shows two problems with the recently added 'r' 
> modifier: '/or' and '/xor'.  Hence I've retitled the subject of this 
> post to include Yves' earlier comment on the whimsical nature of finding 
> these backward compatibility issues.  'r' was added with nary a peep, 
> IIRC, about such things.  There is a .t patch in the queue somewhere, 
> BTW, which if it had ever been applied, I think would have found these.
> 
> The reason there are so few of the charset modifier issues is because we 
> decided that if there were more than one mutually exclusive flag, it 
> would be a syntax error.  Thus 'lt' in David's example is not ambiguous. 
>  It has to mean that the 'lt' is the less-than operator, because 
> otherwise leads to a syntax error.
> 
> I had come up with preferring 'd' as the modifier meaning the 
> traditional behavior instead of 't', because I think it succinctly 
> describes what is happening: the character set used is like a 
> dual-valued variable.  It can be native sometimes, and unicode other 
> times.  (I personally think this behavior is crazy.)  'd' really lays it 
> out what this means; whereas 'traditional' is sort of hazy.  I bet there 
> are readers of this list who don't realize what is going on, and I, 
> who've looked at the code extensively, still get surprised.  And using 
> 'd', reduces the compatibility problems in regard to the charset flags 
> to one, assuming my light-of-day analysis is correct.
> 
> The analysis is that I wrote the attached program and ran it on all the 
> keywords in the DATA section of keywords.pl.  It finds all keywords that 
> consist only of characters that are regex modifiers.  My claim is that a 
> new ambiguity exists only if the addition of any legal combination of 
> the new modifiers spells a keyword that isn't already spelled, and if 
> that keyword can occur without a syntax error in the context of 
> immediately following a regex.  It prints out the various words spelled. 
>  I then eyeballed the output looking for ones that I thought legally 
> could follow a regex.  Perhaps people more familiar with the nuances of 
> Perl will see more.  The output follows (note that cmp, ge, and x would 
> be forbidden under our new policy):
> 
> Pre-existing potential conflicts: cmp, cos, exec, exp, ge, m, pipe, pop, 
> pos, s , semop, x
> 
>  New potential conflicts: close, else, grep, lc, le, log, or, our, 
> sleep, splice , uc, use, xor
> 
>  New potential conflicts with /t: exists, exit, getc, getpgrp, gmtime, 
> goto, gt,  msgget, oct, reset, semget, setpgrp, sort, tie, time, times, tr
> 
>  New potential conflicts with /d: die, do, ord, redo, rmdir
> 


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About