develooper Front page | perl.perl5.porters | Postings from August 2010

Re: RFC: New regex modifier flags; also the whimsical nature of backward compatibility; new 'r' flag has issues

Thread Previous | Thread Next
From:
karl williamson
Date:
August 7, 2010 15:29
Subject:
Re: RFC: New regex modifier flags; also the whimsical nature of backward compatibility; new 'r' flag has issues
Message ID:
4C5DDE3C.3060006@khwilliamson.com
Ævar Arnfjörð Bjarmason wrote:
> On Sat, Aug 7, 2010 at 15:31, karl williamson <public@khwilliamson.com> wrote:
>> David Golden wrote:
>>> On Fri, Aug 6, 2010 at 8:28 PM, Ævar Arnfjörð Bjarmason
>>> <avarab@gmail.com> wrote:
>>>> +0.5
>>>>
>>>> I must say I don't like the tradeoff of making users pound their shift
>>>> keys in perpetuity to produce /UL instead of /ul to maintain backwards
>>>> compatibility with an unlikely-to-occur bit of syntax.
>>>>
>>>> Has anyone done tests to find out if cases like C</foo/lt "bar">
>>>> actually occur in the wild (e.g. on CPAN). Or is this just a runaway
>>>> backwards compat hypothetical?
>>> I think it's a hypothetical.
>>>
>>> As I said originally, I'm 100% happy to declare run-on's to be a
>>> syntax error, declare the parser to have a bug for not detecting it to
>>> date, break whatever corner cases we happen to break, and just go
>>> straight to /l and /u for modifiers.
>>>
>>> That didn't seem to get a lot of traction and Jesse seemed to indicate
>>> he'd prefer that kind of breakage to have lexical scope using a
>>> feature.
>>>
>>> Given the extra complexity of a "temporary" feature, I think uppercase
>>> /L and /U (with the option of lower case synonyms in 5.16) is a
>>> reasonable compromise.
>>>
>>> -- David
>>>
>> I'm back, and hope I have gotten some perspective.  I really want to make
>> progress, decide on something good, and go with it.  I was willing to go
>> with any of the options I had laid out; David's is a modification of one of
>> them.  But I note that both he and Jesse had just recently called the idea
>> of using uppercase modifiers "crazy".
>>
>> So, I too would rather not condemn people to the extra keystroke in
>> perpetuity; and I'm still not sure that we have to do it even in 5.14.
>>
>> David's analysis is correct (and is very similar to what Eric wrote a while
>> back:
>> http://www.nntp.perl.org/group/perl.perl5.porters/2010/05/msg160173.html
>> ).
>> To summarize, and be more precise, the cases where any of the potential new
>> modifiers 'r', 'l', 'd', or 't' could be something other than modifiers,
>> are:
>>
>> 1) a string of modifiers ending in 'lt'
>> 2) a string of modifiers ending in 'le'
>> 3) a string of modifiers ending in 'until'
>> 4) a string of modifiers ending in 'unless'
>>
>> That means that the 'd' and 'r' modifiers are without conflicts, as David
>> noted.  But, when the parser sees the 'l' and 'u' characters in what  so far
>> is a string of regex modifiers, it can't be sure without look-ahead if they
>> are modifiers or the beginning of the keywords above.  But I believe, that
>> it can resolve all ambiguities with sufficient amount of lookahead, and that
>> in all but one case, such lookahead is trivial.  The 'l' and 't' modifiers
>> cannot appear together because of our rules about their use, so 'lt' has to
>> be the less-than operator; and note that 'e' is legal only in s///
>>
>> Here's some pseudo code:
>> case 'u':
>>    if the next character is an 'n', not a modifier; otherwise is.
>> case 'l':
>>    if the next char is a 't', not a modifier
>>    else if the next char is not an 'e', is a modifier
>>    else if not in a s///, not a modifier
>>    else if the next character beyond the e is alpha, is a modifier
>>    else if the next thing in the input is an operand, not a modifier
>>    else is a modifier
>>
>> So the apparent ambiguity is trivially resolved for /u.  The only case where
>> you need to look ahead more than one character is s///le.  Correct me if I'm
>> wrong with Perl, but I believe that what can legally follow a binary
>> operator must be an operand.  I don't know how hard it is to lookahead and
>> distinguish between an operand or non-operand; I haven't investigated.
>>  Perhaps someone can tell me.  If it's easy, then the ambiguity is easily
>> resolvable, and we don't need the capital letters in 5.14.
> 
> I like this, and feel silly for not thinking of it myself.
> 
> We should be able to put a bit more complexity in the parser with the
> gain of not introducing capital-letter modifiers that we'd have to
> support forever. That's a major plus.
> 
> The only edge cases would be for a few builtins, which can be handled
> with appropriate warnings as you suggest.

I don't understand what you mean by "a few builtins".
> 
>> If it's not easy, here's a counterproposal.  We use the lowercase letters in
>> 5.14.  In the single subcase where we give up figuring it out, we assume
>> that the 'le' are not modifiers, and print out a warning, saying to use
>> capital L to get the modifier meaning.  That is, in 5.14, we add both 'l',
>> and 'L' modifiers and they both normally mean the same thing; but one can
>> use the 'L' if necessary to cope with our laziness in not figuring out what
>> was meant to begin with.   The alternative is back to my original proposal,
>> which is to tell them in the warning to spell it '...el' instead.  I
>> actually like that better, as it doesn't require a new modifier.
> 
> Rather than add a L modifier, we could simply suggest that when the
> user writes this:
> 
>     /foo/le "bar"

I want to reiterate that actually the above is not a problem, since it's 
invalid because the e only is for substitutions:
     s/foo/abc/le "bar"
So the universe of problems is much smaller.
> 
> They need to change that to:
> 
>     /(?l:foo)/e "bar"
> 
> To get the intended "not the le comparison operator"
> meaning. I.e. instead of doing:
> 
>     /foo/Le "bar"
> 
> I think that sucks less than introducing a new L synonym for l which
> we're planning to throw away anyway. It's not too much to ask that the
> user just fall back to (?:) when they've painted themselves into this
> corner.
> 

What I was trying to suggest is that it could be rewritten as:
     s/foo/abc/el "bar"
without having to go to the (?:) notation, and the warning message could 
say that.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About