develooper Front page | perl.perl5.porters | Postings from December 2010

PATCH for review: Add /a regex modifier

karl williamson
December 30, 2010 20:27
PATCH for review: Add /a regex modifier
Message ID:
This series of commits adds /a to the regex bestiary.  It is also 
available at git://
branch regex

It is not intended for direct application, but since it is fairly 
extensive, I thought I ought to give people an opportunity to review 
(I'm thinking a week) and comment on it.  The final commit is large, and 
the one that eventually gets applied would be split up some.  I also 
intend to do some clean up of things, like perhaps shortening some 
variable names.

/a is /u plus it restricts \d, \s, \w, and [[:posix:]] to only match in 
the ASCII range; with corresponding changes to \D, \S, \W, and \b and \B.

In developing this, I found two unrelated bugs, which are here fixed. 
I'll address the one with locales in a separate posting.  The other has 
to do with negated character classes, such as \W.  Using /d, \W should 
know about the Latin1 word characters when the target string is utf8. 
It hasn't until this patch, and several tests in the suite relied on 
that; there was also a TODO in the suite that this fixes.
For the purposes of this review, I chose to change to 
behave the way it always has instead of changing the tests, as I think 
that is the best option; for the others, I added the /a modifier in the 
the tests to get them to work.  I have figured out a better way to fix 
this bug than in this review version, but since it is secondary to the 
main thrust, have left it in as-is.

This patch breaks .xs source code compatibility.  Some months ago, I was 
told to not be concerned about that.  There are 5 or 6 modules in CPAN 
that appear to possibly be affected.  The breakage has to do with 
finding what the character set (as I'm calling  it) modifier in effect 
is: /a, /d, /l, or /u.  I've included a new API that both gets and sets 
the value; plus a function that gets the modifier letter.

I don't know what to do about B in this regard.  It was not relying on 
the old values specifically.

The added .t was checked against the source files to makes sure it 
exercised all the cases there.

This patch does not include the documentation changes. Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About