develooper Front page | perl.perl5.porters | Postings from August 2010

Re: RFC: New regex modifier flags; also the whimsical nature of backward compatibility; new 'r' flag has issues

Thread Previous | Thread Next
Ævar Arnfjörð Bjarmason
August 7, 2010 13:52
Re: RFC: New regex modifier flags; also the whimsical nature of backward compatibility; new 'r' flag has issues
Message ID:
On Sat, Aug 7, 2010 at 15:31, karl williamson <> wrote:
> David Golden wrote:
>> On Fri, Aug 6, 2010 at 8:28 PM, Ævar Arnfjörð Bjarmason
>> <> wrote:
>>> +0.5
>>> I must say I don't like the tradeoff of making users pound their shift
>>> keys in perpetuity to produce /UL instead of /ul to maintain backwards
>>> compatibility with an unlikely-to-occur bit of syntax.
>>> Has anyone done tests to find out if cases like C</foo/lt "bar">
>>> actually occur in the wild (e.g. on CPAN). Or is this just a runaway
>>> backwards compat hypothetical?
>> I think it's a hypothetical.
>> As I said originally, I'm 100% happy to declare run-on's to be a
>> syntax error, declare the parser to have a bug for not detecting it to
>> date, break whatever corner cases we happen to break, and just go
>> straight to /l and /u for modifiers.
>> That didn't seem to get a lot of traction and Jesse seemed to indicate
>> he'd prefer that kind of breakage to have lexical scope using a
>> feature.
>> Given the extra complexity of a "temporary" feature, I think uppercase
>> /L and /U (with the option of lower case synonyms in 5.16) is a
>> reasonable compromise.
>> -- David
> I'm back, and hope I have gotten some perspective.  I really want to make
> progress, decide on something good, and go with it.  I was willing to go
> with any of the options I had laid out; David's is a modification of one of
> them.  But I note that both he and Jesse had just recently called the idea
> of using uppercase modifiers "crazy".
> So, I too would rather not condemn people to the extra keystroke in
> perpetuity; and I'm still not sure that we have to do it even in 5.14.
> David's analysis is correct (and is very similar to what Eric wrote a while
> back:
> ).
> To summarize, and be more precise, the cases where any of the potential new
> modifiers 'r', 'l', 'd', or 't' could be something other than modifiers,
> are:
> 1) a string of modifiers ending in 'lt'
> 2) a string of modifiers ending in 'le'
> 3) a string of modifiers ending in 'until'
> 4) a string of modifiers ending in 'unless'
> That means that the 'd' and 'r' modifiers are without conflicts, as David
> noted.  But, when the parser sees the 'l' and 'u' characters in what  so far
> is a string of regex modifiers, it can't be sure without look-ahead if they
> are modifiers or the beginning of the keywords above.  But I believe, that
> it can resolve all ambiguities with sufficient amount of lookahead, and that
> in all but one case, such lookahead is trivial.  The 'l' and 't' modifiers
> cannot appear together because of our rules about their use, so 'lt' has to
> be the less-than operator; and note that 'e' is legal only in s///
> Here's some pseudo code:
> case 'u':
>    if the next character is an 'n', not a modifier; otherwise is.
> case 'l':
>    if the next char is a 't', not a modifier
>    else if the next char is not an 'e', is a modifier
>    else if not in a s///, not a modifier
>    else if the next character beyond the e is alpha, is a modifier
>    else if the next thing in the input is an operand, not a modifier
>    else is a modifier
> So the apparent ambiguity is trivially resolved for /u.  The only case where
> you need to look ahead more than one character is s///le.  Correct me if I'm
> wrong with Perl, but I believe that what can legally follow a binary
> operator must be an operand.  I don't know how hard it is to lookahead and
> distinguish between an operand or non-operand; I haven't investigated.
>  Perhaps someone can tell me.  If it's easy, then the ambiguity is easily
> resolvable, and we don't need the capital letters in 5.14.

I like this, and feel silly for not thinking of it myself.

We should be able to put a bit more complexity in the parser with the
gain of not introducing capital-letter modifiers that we'd have to
support forever. That's a major plus.

The only edge cases would be for a few builtins, which can be handled
with appropriate warnings as you suggest.

> If it's not easy, here's a counterproposal.  We use the lowercase letters in
> 5.14.  In the single subcase where we give up figuring it out, we assume
> that the 'le' are not modifiers, and print out a warning, saying to use
> capital L to get the modifier meaning.  That is, in 5.14, we add both 'l',
> and 'L' modifiers and they both normally mean the same thing; but one can
> use the 'L' if necessary to cope with our laziness in not figuring out what
> was meant to begin with.   The alternative is back to my original proposal,
> which is to tell them in the warning to spell it '...el' instead.  I
> actually like that better, as it doesn't require a new modifier.

Rather than add a L modifier, we could simply suggest that when the
user writes this:

    /foo/le "bar"

They need to change that to:

    /(?l:foo)/e "bar"

To get the intended "not the le comparison operator"
meaning. I.e. instead of doing:

    /foo/Le "bar"

I think that sucks less than introducing a new L synonym for l which
we're planning to throw away anyway. It's not too much to ask that the
user just fall back to (?:) when they've painted themselves into this

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About