develooper Front page | perl.perl5.porters | Postings from October 2014

Bringing the regex compiler into the current millenium.

Thread Next
October 23, 2014 08:15
Bringing the regex compiler into the current millenium.
Message ID:
Recent bugs and developments in the regex compiler have made me think that
the code we have is too close to unmaintainable.

The process of a compiling a pattern is inefficient, difficult to work
with, extend or optimise, and IMO error prone.

I will start a project to rewrite the regex compiler in the next little
while, I will work on a branch (name to be announced), and I welcome any
contributions or support in this effort.

My objectives will be to:

1. Change the current awkward multi-pass lexer, into a single pass lexer
which constructs an AST which is then analysed and optimised, which woud
then be used to produce the final encoded program.

2. Split apart study_chunk(), not exactly sure how yet, but with the aim of
splitting apparent its various functions, which include multiple peephole
optimisations, analysis, and error detection.

3. Require as few changes to regexec.c as possible. Ideally no code in
regexec.c should have to change. (Unless we come up with a *really*
compelling reason to do so.

4. If possible I would like to have a process wide cache of pattern
snippets that can be used to speed compilation and reduce memory pressure
from regexes. There should be no need for a perl process to have more than
one /\s+/ pattern compiled for instance.

For those that don't know we do multiple lex passes over a pattern, (IIRC)
possibly as many as 4 passes. (Definitely 3). We also do a very complicated
traversal of the final compiled program with the potential of lots of O(N)

In general my objective is not to improve performance at first, but rather
to simply make the process sane to understand, and easier to implement. I
believe that once the code has been restructured to be easier to understand
and work with we will find performance improvements come along for the
ride, or at the very least, are much easier to implement.

I welcome any interest in this project. Please let me know if you have any
thoughts or wish to contribute.


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About