develooper Front page | perl.perl5.porters | Postings from August 2014

RFC: long regex pattern modifiers

Thread Next
Karl Williamson
August 27, 2014 00:12
RFC: long regex pattern modifiers
Message ID:
I have mentioned in earlier posts about the upcoming need for going 
beyond the single-char pattern modifiers /msixpodualgcer.  (Some 
examples include being able to override /i definitions, for user-defined 
Unicode private-use properties, for allowing one to globally say that \b 
really should be \b{wb}, and others.)

I'm here proposing a syntax for doing this.  An example would be
  /(?mi{long-modifier}u: ... )/

A long modifier is simply anything enclosed in {} between the '(?' and 
the ':'.  Each such modifier would have its own pair of braces.  This is 
currently illegal syntax.  I do not see the need for, and hence propose 
explicitly not to accept these at this time except in the infix (?:...) 
notation.  We could expand at some point to accept the (?...) notation 
having long modifiers if there is demand.  But I'm not sure I would want 
the postfix notation to ever allow long modifiers.

The syntax of what's enclosed in the {} is not specified now, except 
that anything within wouldn't break current parsing of the pattern as a 
whole.  Hence probably braces would have to be balanced, etc.

Long modifiers at this time would be essentially for other pragmas to 
fill-in, and not for users.  We wouldn't document what they are, but 
obviously any stringification of the pattern would show them.  At some 
point in the future, after we are comfortable with this, we could 
add/document some intended to be user-specifiable.

It might be that the pragmas that generate long modifiers would be 
marked experimental at first so that this all could be removed.

One behavior that has been requested is to make the (?[...]) behavior 
work on regular [...] classes (that is the extra syntax checks, etc, but 
not the set operations).  One could say something like

  use re 'X[]';

and within its scope, any regex that uses a regular  bracketed character 
class would get the extended rules.  This would be implemented via a 
long modifier.  (X[] for extended bracketed class is something I just 
pulled out of the air, and I'm sure there are much better ways of 
spelling it.)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About