develooper Front page | perl.perl5.porters | Postings from November 2022

Breaking up regcomp.c and maybe regexec.c

Thread Next
From:
demerphq
Date:
November 10, 2022 09:23
Subject:
Breaking up regcomp.c and maybe regexec.c
Message ID:
CANgJU+VdVUCa3wX3=LzVBYBKAU+kDckWjS+yJRMoe5bGS7Wd5g@mail.gmail.com
The regex engine is a huge amount of code divided into several
conceptual components along with tons of ancillary code, all crammed
basically into two files. regcomp.c has 26k lines, and regexec.c has
11k lines.

The nature of some of the code somewhat biases towards large
functions, but nevertheless we have several large functions in both of
the files, arguably code that could be separated into more reasonably
sized compilation units.

So for instance the regex compiler is divided into the toker/parser
and a peephole optimizer, debug code, and ancillary logic, for
instance all of the trie code could be in its own file. The regex
execution engine is divided into a specialized "find the start"
function and a more generalized "regex execution engine". These are
related, but really need not be in the same files.

Some of this code could be reused. How many people know we have a
complete levenstein distance computation code hidden in the regex
engine? (Besides Karl. :-) We could expose this to more areas of the
code (mistyped scalars comes to mind), and maybe also via a builtin
(since it is in the core why not let people use it? (for instance
Merijn pointed out that cpan could use it if it was available).

The counter argument for splitting the code would be that we would
lose some sense of the history of the code via standard blame
invocation. Blame can be told to do the work to see past these splits,
but it is not always perfect at it.

Personally I think having more structure would make it easier to
maintain and understand, but I am not sure I fully understand all the
factors that need considering when doing a reorg like this. I have
noticed adding a new .c file to the code base is a pain, but it seems
manageable if a bit tedius. But are there any other concerns I should
know about or take into account? Anybody  with experience adding a new
.c file have any guidance I can follow.

Does anybody have any thoughts on this? Relatively few people work on
the regex code, does anybody have any strong feelings about this? Any
guidance to provide?

cheers,
Yves




-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About