David Golden wrote: > Let me see if I can sum up my understanding of the issue of conflicts. > > Today, the behavior of regex modifiers is that if a valid regex > modifier (one of 'cegimopsx') is seen after the closing delimiter of > the regex, Perl will consume all valid modifiers. For example, in > "s/foo/bar/ge", the "ge" is never interpreted as the "ge" operator. > If no modifiers appear or if a non-modifier character appears after > all valid modifiers are consumed, it is interpreted as something other > than a regex modifier. Because of the rules of syntax, the next thing > after a regex must be either an operator or a statement modifier or > else a syntax error will occur. > > By adding letters to the list of statement modifiers, we create a > conflict with any operators or statement modifiers that start with the > same letter because those are the only things that today would be > "legal" as a run-on with existing modifiers that would "break" when > the newly added letter is consumed during parsing. > > To restate that differently: we only care about the *first* letter of > operators and modifiers. > > For letters under discussion (including "r"), here is a list of > conflicting operators and modifiers: > > d: [no conflicts] > l: lt, le > r: [no conflicts] > t: [no conflicts] > u: unless, until > > So "r" is not a problem (which is good, since we already added it). > Both "t" and "d" are not problems. The problems are "u" and "l". > > For reference, here is a list of letters that are *not* already used > as regex modifiers and that do *not* appear at the start of *any* > operator or modifier: b, d, h, j, k, q, t, v, y, and z. Any of these > could be used as modifiers today without a problem. Your statement above was making a lot of sense to me, wondering if my muddled thinking was a result of lack of sleep, or Alzheimer's. But what I hadn't been clear about to you guys, is that it isn't as bad as this. The parser would do look-ahead to rule out things like unless and until. They can't be regex modifiers because they have an 'n' in them. The only things that are ambiguous are keywords consisting entirely of regex modifier letters. As I said before, 'lt' is invalid even if we use /t as the modifier, so it has to be less-than. And I do think, and I don't have time to think more about it right now, that 'gt' is an issue even though you don't. > > At the risk of taking the design discussion in circles, it occurs to > me that the whole conflict problem just goes away if we use "U" and > "L" instead of "u" and "l". (We can use either "t" or "d" for the > third since they don't conflict). > > I don't particularly care about making mutually-exclusive regex > modifiers visually distinctive, since improper use should just be a > syntax error anyway, so I don't think that using upper case "means > anything" or has to set any precedent in that regard. It's just a > character space that has no conflicts. > > So that's my proposal: > > * use "U", "L", and either "d" (for "dual"/"dumb") or "t" (for > "text"/"traditional") > * throw a syntax error if more than one of these appears in the list > of modifiers > * leave the run-on deprecation as is and make run-ons a syntax error > in 5.16 to eliminate any future conflict issue Off the top of my head, subject to further reflection, I don't have a problem with this. Why your analysis doesn't pick up 'gt' remains a concern, are you or I wrong here? And should the d be D for consistency? I'll ponder. > > That's simple and fixes the problem now without messing with features, > making assumptions about parsing ambiguous situations or introducing > wacky dual-letter regex modifiers. > > All those in favor? > > -- David > And, I'm starting to learn this new-fangled IRC stuff.Thread Previous | Thread Next