Recent bugs and developments in the regex compiler have made me think that the code we have is too close to unmaintainable. The process of a compiling a pattern is inefficient, difficult to work with, extend or optimise, and IMO error prone. I will start a project to rewrite the regex compiler in the next little while, I will work on a branch (name to be announced), and I welcome any contributions or support in this effort. My objectives will be to: 1. Change the current awkward multi-pass lexer, into a single pass lexer which constructs an AST which is then analysed and optimised, which woud then be used to produce the final encoded program. 2. Split apart study_chunk(), not exactly sure how yet, but with the aim of splitting apparent its various functions, which include multiple peephole optimisations, analysis, and error detection. 3. Require as few changes to regexec.c as possible. Ideally no code in regexec.c should have to change. (Unless we come up with a *really* compelling reason to do so. 4. If possible I would like to have a process wide cache of pattern snippets that can be used to speed compilation and reduce memory pressure from regexes. There should be no need for a perl process to have more than one /\s+/ pattern compiled for instance. For those that don't know we do multiple lex passes over a pattern, (IIRC) possibly as many as 4 passes. (Definitely 3). We also do a very complicated traversal of the final compiled program with the potential of lots of O(N) operations. In general my objective is not to improve performance at first, but rather to simply make the process sane to understand, and easier to implement. I believe that once the code has been restructured to be easier to understand and work with we will find performance improvements come along for the ride, or at the very least, are much easier to implement. I welcome any interest in this project. Please let me know if you have any thoughts or wish to contribute. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Next