develooper Front page | perl.perl5.porters | Postings from November 2022

Re: Breaking up regcomp.c and maybe regexec.c

Thread Previous | Thread Next
November 12, 2022 11:03
Re: Breaking up regcomp.c and maybe regexec.c
Message ID:
On Sat, 12 Nov 2022 at 10:32, Dave Mitchell <> wrote:
> On Thu, Nov 10, 2022 at 12:31:41PM +0000, Paul "LeoNerd" Evans wrote:
> > As to naming, I wonder if we want to consider having things like
> > regtrie.c, regpeep.c, etc... Or whether it might make more sense to
> > move all the regexp engine stuff into a re/ subdirectory?
> I'd prefer not. Currently I often do
>     grep foo *.[ch]
> when hunting for things in the core (as opposed to in everything) and I
> can imagine constantly forgetting to search re/ (and any other additions).

Splitting it up is hard enough, so I decided not to burden myself with
the added complexities of having a subdir. :-)

> As for the split, I'm particularly bothered either way. But if we do
> reorganise, then there are a few runtime things in regcomp.c that really
> aught to be in regexec.c, like Perl_reg_numbered_buff_fetch().

I guess when you say "in regexec.c" you mean "in the same compilation
unit as the pregexec()"? Eg, would it be ok if i put all the run time
utility stuff in one file (say regexec_util.c) and left pregexec() in
regexec.c or is there a reason they should be in the same file
specifically? Are there any caveats that I should be aware of here?

Currently I have this:

$ wc -l reg*.[ch]
   3858 regcharclass.h
  16443 regcomp.c
   1627 regcomp_debug.c
   1434 regcomp.h
   1196 regcomp_internal.h
   1628 regcomp_invlist.c
   3804 regcomp_study.c
   1688 regcomp_trie.c
  11748 regexec.c
    947 regexp.h
     64 reginline.h
   2946 regnodes.h
  47383 total

All the new things currently have regcomp_xxx names. I havent gotten
to regexec. yet. I suspect a bunch of stuff which is in regcomp.h
(used widely) can long term be moved to regcomp_internal.h (new and
used only in regcomp.c derived code), which would mean less to rebuild
after a regex engine specific change.

Aside:I am noticing that the whole PERL_IN_REGCOMP_C thing in
embed.fnc is showing its teeth, especially with inline functions.
Maybe this is specific to the regex engine, but we have a lot of code
that has these kinds of guard wrapped around them and untangling them
when I add a new file is a big part of the work involved. Inline
functions as I mentioned are troublesome because if you end up
including them in the wrong place they fail to build completely
because the ARGS macros arent available for their functions.  I can't
help but wonder if life would be better if the ARGS macros for inline
functions were available in all 'c' code. So what if they dont get

The other thing is that using specific defines for *each* file, means
some of the guard clauses are super long. I can't help but wonder if
we shouldnt be approaching this issue in a different way, with
"interface groups" defined and/or requested by the different units of
code. Or perhaps alternatively with reworking how the guard clauses in
embed.fnc are handled entirely. A tool that scans our code and
generates the appropriate guard clauses would make refactoring our
code a lot easier. Also we could arrange that when we generate the
defines for things we only include static defines when in the correct

I might try out any of the above ideas so if anybody has thoughts on
this please let me know.


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About