develooper Front page | perl.perl5.porters | Postings from January 2013

Perl5 blead now contains experimental (?[ ]) regex sets feature

Thread Next
Karl Williamson
January 11, 2013 19:35
Perl5 blead now contains experimental (?[ ]) regex sets feature
Message ID:
 From  commit 9d1a5160ac870eccea399973eaa9f9e3020b0833
  Author: Karl Williamson <>
  Date:   Thu Jan 10 17:06:04 2013 -0700

      New regex experimental feature: (?[ ])

      This is a fancier [bracketed] character class which allows set
      operations, such as intersection and subtraction.  The entry in perlre
      for this commit details its operation.

      Besides extending regular expressions to handle this functionality,
      recommended by Unicode, the intent here is to do three things:

      1) Intersection has been simulated by regexes using zero-width
         look-around assertions, which are non-obvious.  This allows 
         those with a more powerful and clearer syntax; the compiled regexes
         are smaller and faster.  Everything is known at compile time.
      2) Set operations have also been simulated by using user-defined 
         properties.  These are globals, have security implications,
         restricted names, and d don't allow as complex expressions as this
         new feature.
      3) I hope that this feature will come to be viewed as a "better"
         bracketed character class.  I took advantage of the fact that there
         is no embedded base to have to be compatibile with to forbid 
         iffy practices with the existing ones, while remaining mostly
         backwards compatible.  The main difference is that /x is always
         enabled, so white space can be pretty much freely used with these,
         but to specify a match on white space, it must be escaped.  Things
         that should have been illegal are, such as \x{}, and \x{abcdefghi}.
         Things that look like a posix specifier but don't quite meet the
         rules now give an error instead of silently compiling. e.g., 
         is an error instead of the union of the characters that compose it.
         I may have omitted things; perhaps it should be an error to 
have the
         same letter occur twice, adjacent.  Since this is experimental, we
         can make such changes based on field feed back.

      The intent is to keep this feature, since it is strongly 
recommended by
      Unicode.  The exact syntax is subject to change, so is experimental.


Yves had a somewhat different internal syntax proposed.  I did what I 
originally thought for several reasons, but we can discuss changes.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About