develooper Front page | perl.perl5.porters | Postings from October 2014

Re: Bringing the regex compiler into the current millenium.

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
October 24, 2014 11:20
Subject:
Re: Bringing the regex compiler into the current millenium.
Message ID:
20141024111953.GQ5204@iabyn.com
On Thu, Oct 23, 2014 at 04:36:20PM +0200, demerphq wrote:
> Do you feel that if I were to just ignore malloc costs to start that it
> would be reasonable to retrofit the slab stuff?

My main worry would be how doing lots of tiny mallocs and frees to
compile a regex would affect performance. Compiling and using just once a
big list of patterns is an all-to-common occurrence; e.g. I could imagine
someone writing code like

    while (<$usr_share_dict_words_fh>) {
        chomp;
        if ($foo =~ /\b\Q$_\E\b/) ...
    }

If that suddenly ran much slower, or caused much heap fragmentation, that
would be a Bad Thing.

But if it didn't, then retrofitting seems a reasonable thing to do.

> Well you could keep an eye on what I am doing and help me get the slab
> stuff set up to your satisfaction. TBH, that is the part I feel the least
> confident about. Guidance in terms of getting the basic foundations set up
> right would be very valuable, from there I feel pretty confident I can run
> a long way without needing too much help.

There are two obvious recent(ish) slab implementations, FC's OP slabs,
and my runtime regexec stack. Both have the property of coalescing lots
of little mallocs into a few big slab allocs, with a single big free-up
at the end (of the lifetime of the optree or the regex execution).

The main issue is how variable-length the individual AST nodes would
be. This is where I'm on hazy ground, but IIRC, most nodes currently are
of a roughly similar size (so could be handled as a fixed-size union),
while some nodes (like EXACT) have an arbitrary-length string or other
data appended. You could just slab the nodes and store pointers to
malloced string buffers, or maybe have a separate slab type(s) to hold
the extra data, or allocate variable sized chunks within a slab, or
whatever. I guess it All Depends.

-- 
No matter how many dust sheets you use, you will get paint on the carpet.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About