develooper Front page | perl.perl5.porters | Postings from August 2010

Re: RFC: New regex modifier flags; also the whimsical nature of backwardcompatibility; new 'r' flag has issues

Thread Previous | Thread Next
karl williamson
August 7, 2010 08:31
Re: RFC: New regex modifier flags; also the whimsical nature of backwardcompatibility; new 'r' flag has issues
Message ID:
David Golden wrote:
> On Fri, Aug 6, 2010 at 8:28 PM, Ævar Arnfjörð Bjarmason
> <> wrote:
>> +0.5
>> I must say I don't like the tradeoff of making users pound their shift
>> keys in perpetuity to produce /UL instead of /ul to maintain backwards
>> compatibility with an unlikely-to-occur bit of syntax.
>> Has anyone done tests to find out if cases like C</foo/lt "bar">
>> actually occur in the wild (e.g. on CPAN). Or is this just a runaway
>> backwards compat hypothetical?
> I think it's a hypothetical.
> As I said originally, I'm 100% happy to declare run-on's to be a
> syntax error, declare the parser to have a bug for not detecting it to
> date, break whatever corner cases we happen to break, and just go
> straight to /l and /u for modifiers.
> That didn't seem to get a lot of traction and Jesse seemed to indicate
> he'd prefer that kind of breakage to have lexical scope using a
> feature.
> Given the extra complexity of a "temporary" feature, I think uppercase
> /L and /U (with the option of lower case synonyms in 5.16) is a
> reasonable compromise.
> -- David

I'm back, and hope I have gotten some perspective.  I really want to 
make progress, decide on something good, and go with it.  I was willing 
to go with any of the options I had laid out; David's is a modification 
of one of them.  But I note that both he and Jesse had just recently 
called the idea of using uppercase modifiers "crazy".

So, I too would rather not condemn people to the extra keystroke in 
perpetuity; and I'm still not sure that we have to do it even in 5.14.

David's analysis is correct (and is very similar to what Eric wrote a 
while back:
To summarize, and be more precise, the cases where any of the potential 
new modifiers 'r', 'l', 'd', or 't' could be something other than 
modifiers, are:

1) a string of modifiers ending in 'lt'
2) a string of modifiers ending in 'le'
3) a string of modifiers ending in 'until'
4) a string of modifiers ending in 'unless'

That means that the 'd' and 'r' modifiers are without conflicts, as 
David noted.  But, when the parser sees the 'l' and 'u' characters in 
what  so far is a string of regex modifiers, it can't be sure without 
look-ahead if they are modifiers or the beginning of the keywords above. 
  But I believe, that it can resolve all ambiguities with sufficient 
amount of lookahead, and that in all but one case, such lookahead is 
trivial.  The 'l' and 't' modifiers cannot appear together because of 
our rules about their use, so 'lt' has to be the less-than operator; and 
note that 'e' is legal only in s///

Here's some pseudo code:
case 'u':
     if the next character is an 'n', not a modifier; otherwise is.
case 'l':
     if the next char is a 't', not a modifier
     else if the next char is not an 'e', is a modifier
     else if not in a s///, not a modifier
     else if the next character beyond the e is alpha, is a modifier
     else if the next thing in the input is an operand, not a modifier
     else is a modifier

So the apparent ambiguity is trivially resolved for /u.  The only case 
where you need to look ahead more than one character is s///le.  Correct 
me if I'm wrong with Perl, but I believe that what can legally follow a 
binary operator must be an operand.  I don't know how hard it is to 
lookahead and distinguish between an operand or non-operand; I haven't 
investigated.  Perhaps someone can tell me.  If it's easy, then the 
ambiguity is easily resolvable, and we don't need the capital letters in 

If it's not easy, here's a counterproposal.  We use the lowercase 
letters in 5.14.  In the single subcase where we give up figuring it 
out, we assume that the 'le' are not modifiers, and print out a warning, 
saying to use capital L to get the modifier meaning.  That is, in 5.14, 
we add both 'l', and 'L' modifiers and they both normally mean the same 
thing; but one can use the 'L' if necessary to cope with our laziness in 
not figuring out what was meant to begin with.   The alternative is back 
to my original proposal, which is to tell them in the warning to spell 
it '...el' instead.  I actually like that better, as it doesn't require 
a new modifier.

So, I guess I'm pushing my proposal still.  What is different, is that I 
think we now are agreed that the problematic cases are very few, which 
is why my proposal makes sense at all.  I hope I've persuaded you that 
there really is only one case that may not be easily resolvable.  And I 
still think that the appropriate warning is sufficient to handle it.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About