develooper Front page | perl.perl5.porters | Postings from August 2010

Re: RFC: New regex modifier flags; also the whimsical nature of backward compatibility; new 'r' flag has issues

Thread Previous | Thread Next
David Golden
August 6, 2010 12:52
Re: RFC: New regex modifier flags; also the whimsical nature of backward compatibility; new 'r' flag has issues
Message ID:
Let me see if I can sum up my understanding of the issue of conflicts.

Today, the behavior of regex modifiers is that if a valid regex
modifier (one of 'cegimopsx') is seen after the closing delimiter of
the regex, Perl will consume all valid modifiers.  For example, in
"s/foo/bar/ge", the "ge" is never interpreted as the "ge" operator.
If no modifiers appear or if a non-modifier character appears after
all valid modifiers are consumed, it is interpreted as something other
than a regex modifier.  Because of the rules of syntax, the next thing
after a regex must be either an operator or a statement modifier or
else a syntax error will occur.

By adding letters to the list of statement modifiers, we create a
conflict with any operators or statement modifiers that start with the
same letter because those are the only things that today would be
"legal" as a run-on with existing modifiers that would "break" when
the newly added letter is consumed during parsing.

To restate that differently: we only care about the *first* letter of
operators and modifiers.

For letters under discussion (including "r"), here is a list of
conflicting operators and modifiers:

    d: [no conflicts]
    l:  lt, le
    r: [no conflicts]
    t: [no conflicts]
    u: unless, until

So "r" is not a problem (which is good, since we already added it).
Both "t" and "d" are not problems.  The problems are "u" and "l".

For reference, here is a list of letters that are *not* already used
as regex modifiers and that do *not* appear at the start of *any*
operator or modifier: b, d, h, j, k, q, t, v, y, and z.  Any of these
could be used as modifiers today without a problem.

At the risk of taking the design discussion in circles, it occurs to
me that the whole conflict problem just goes away if we use "U" and
"L" instead of "u" and "l".  (We can use either "t" or "d" for the
third since they don't conflict).

I don't particularly care about making mutually-exclusive regex
modifiers visually distinctive, since improper use should just be a
syntax error anyway, so I don't think that using upper case "means
anything" or has to set any precedent in that regard.  It's just a
character space that has no conflicts.

So that's my proposal:

* use "U", "L", and either "d" (for "dual"/"dumb") or "t" (for
* throw a syntax error if more than one of these appears in the list
of modifiers
* leave the run-on deprecation as is and make run-ons a syntax error
in 5.16 to eliminate any future conflict issue

That's simple and fixes the problem now without messing with features,
making assumptions about parsing ambiguous situations or introducing
wacky dual-letter regex modifiers.

All those in favor?

-- David

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About