Front page | perl.perl5.porters |
Postings from August 2010
Re: RFC: New regex modifier flags; also the whimsical nature of backward compatibility; new 'r' flag has issues
Thread Previous
|
Thread Next
From:
Ævar Arnfjörð Bjarmason
Date:
August 7, 2010 13:52
Subject:
Re: RFC: New regex modifier flags; also the whimsical nature of backward compatibility; new 'r' flag has issues
Message ID:
AANLkTi=Y0Td5CKjP3Uzd76JgxiNcgJoZcRk5kK5rg3Hm@mail.gmail.com
On Sat, Aug 7, 2010 at 15:31, karl williamson <public@khwilliamson.com> wrote:
> David Golden wrote:
>>
>> On Fri, Aug 6, 2010 at 8:28 PM, Ævar Arnfjörð Bjarmason
>> <avarab@gmail.com> wrote:
>>>
>>> +0.5
>>>
>>> I must say I don't like the tradeoff of making users pound their shift
>>> keys in perpetuity to produce /UL instead of /ul to maintain backwards
>>> compatibility with an unlikely-to-occur bit of syntax.
>>>
>>> Has anyone done tests to find out if cases like C</foo/lt "bar">
>>> actually occur in the wild (e.g. on CPAN). Or is this just a runaway
>>> backwards compat hypothetical?
>>
>> I think it's a hypothetical.
>>
>> As I said originally, I'm 100% happy to declare run-on's to be a
>> syntax error, declare the parser to have a bug for not detecting it to
>> date, break whatever corner cases we happen to break, and just go
>> straight to /l and /u for modifiers.
>>
>> That didn't seem to get a lot of traction and Jesse seemed to indicate
>> he'd prefer that kind of breakage to have lexical scope using a
>> feature.
>>
>> Given the extra complexity of a "temporary" feature, I think uppercase
>> /L and /U (with the option of lower case synonyms in 5.16) is a
>> reasonable compromise.
>>
>> -- David
>>
>
> I'm back, and hope I have gotten some perspective. I really want to make
> progress, decide on something good, and go with it. I was willing to go
> with any of the options I had laid out; David's is a modification of one of
> them. But I note that both he and Jesse had just recently called the idea
> of using uppercase modifiers "crazy".
>
> So, I too would rather not condemn people to the extra keystroke in
> perpetuity; and I'm still not sure that we have to do it even in 5.14.
>
> David's analysis is correct (and is very similar to what Eric wrote a while
> back:
> http://www.nntp.perl.org/group/perl.perl5.porters/2010/05/msg160173.html
> ).
> To summarize, and be more precise, the cases where any of the potential new
> modifiers 'r', 'l', 'd', or 't' could be something other than modifiers,
> are:
>
> 1) a string of modifiers ending in 'lt'
> 2) a string of modifiers ending in 'le'
> 3) a string of modifiers ending in 'until'
> 4) a string of modifiers ending in 'unless'
>
> That means that the 'd' and 'r' modifiers are without conflicts, as David
> noted. But, when the parser sees the 'l' and 'u' characters in what so far
> is a string of regex modifiers, it can't be sure without look-ahead if they
> are modifiers or the beginning of the keywords above. But I believe, that
> it can resolve all ambiguities with sufficient amount of lookahead, and that
> in all but one case, such lookahead is trivial. The 'l' and 't' modifiers
> cannot appear together because of our rules about their use, so 'lt' has to
> be the less-than operator; and note that 'e' is legal only in s///
>
> Here's some pseudo code:
> case 'u':
> if the next character is an 'n', not a modifier; otherwise is.
> case 'l':
> if the next char is a 't', not a modifier
> else if the next char is not an 'e', is a modifier
> else if not in a s///, not a modifier
> else if the next character beyond the e is alpha, is a modifier
> else if the next thing in the input is an operand, not a modifier
> else is a modifier
>
> So the apparent ambiguity is trivially resolved for /u. The only case where
> you need to look ahead more than one character is s///le. Correct me if I'm
> wrong with Perl, but I believe that what can legally follow a binary
> operator must be an operand. I don't know how hard it is to lookahead and
> distinguish between an operand or non-operand; I haven't investigated.
> Perhaps someone can tell me. If it's easy, then the ambiguity is easily
> resolvable, and we don't need the capital letters in 5.14.
I like this, and feel silly for not thinking of it myself.
We should be able to put a bit more complexity in the parser with the
gain of not introducing capital-letter modifiers that we'd have to
support forever. That's a major plus.
The only edge cases would be for a few builtins, which can be handled
with appropriate warnings as you suggest.
> If it's not easy, here's a counterproposal. We use the lowercase letters in
> 5.14. In the single subcase where we give up figuring it out, we assume
> that the 'le' are not modifiers, and print out a warning, saying to use
> capital L to get the modifier meaning. That is, in 5.14, we add both 'l',
> and 'L' modifiers and they both normally mean the same thing; but one can
> use the 'L' if necessary to cope with our laziness in not figuring out what
> was meant to begin with. The alternative is back to my original proposal,
> which is to tell them in the warning to spell it '...el' instead. I
> actually like that better, as it doesn't require a new modifier.
Rather than add a L modifier, we could simply suggest that when the
user writes this:
/foo/le "bar"
They need to change that to:
/(?l:foo)/e "bar"
To get the intended "not the le comparison operator"
meaning. I.e. instead of doing:
/foo/Le "bar"
I think that sucks less than introducing a new L synonym for l which
we're planning to throw away anyway. It's not too much to ask that the
user just fall back to (?:) when they've painted themselves into this
corner.
Thread Previous
|
Thread Next