develooper Front page | perl.perl5.porters | Postings from September 2012

rxres_save etc (was Re: Fix for recursive substitution)

Thread Next
Nicholas Clark
September 14, 2012 03:27
rxres_save etc (was Re: Fix for recursive substitution)
Message ID:
So, as far as I can tell rxres_save is doing more work than it needs to.
With conditional code removed, it's like this:

    rxres_save(&cx->sb_rxres, rx);
    PL_curpm = pm;

Perl_rxres_save(pTHX_ void **rsp, REGEXP *rx)
    UV *p = (UV*)*rsp;
    U32 i;

    if (!p || p[1] < RX_NPARENS(rx)) {
	i = 6 + RX_NPARENS(rx) * 2;
	if (!p)
	    Newx(p, i, UV);
	    Renew(p, i, UV);
	*rsp = (void*)p;

    *p++ = PTR2UV(RX_MATCH_COPIED(rx) ? RX_SUBBEG(rx) : NULL);

    *p++ = RX_NPARENS(rx);

As far as I can tell

1) for any given rsp it's only ever called with the same regular expression
2) RX_NPARENS() doesn't change once a regular expression has been compiled

Hence it's doing more work than it needs to. I considered the attached
patch. But then I *also* wondered why is it doing everything as UVs? It's
like it's written in a style to use the save stack. So, should it be using
the save stack? Or should it instead by using a C structure? [Possibly the
sort that shows "unwarranted chumminess with the C implementation" :-(]

The code dates back to this message

On Fri, May 09, 1997 at 09:03:28PM -0400, Chip Salzenberg wrote:

> I've finally concluded that DBD::Oracle won't work until recursive
> substitution is finally fixed.  Thus the below patch, which is
> Purify-tested for memory usage errors, and which fixes the test case.
> (It also behaves reasonably with the original test case for recursive
> substitution that was tossed around earlier this year.)
> There is a tiny performance cost equal to one malloc/free for each
> substitution statement that succeeds at least once and that has a
> complex right side.  So C<s/a/b/> has no cost, but C<s/(\w+)/\U$1/>
> does, because it's complex: it's equivalent to C<s/(\w+)/uc $1/e>.
> BTW, the cost is *one* malloc/free per such C<s//> statement, no
> matter how many times that statement matches, so I can't imagine the
> cost would ever be noticed; it should be difficult even to measure.

It seems that the above two invariants always held, and that the structure
was never on the save stack. So, I'm not sure which changes described above
are the right ones to make. (Not changing anything *is* an option, but
obviously feels wrong as the current code is doing work it doesn't need to)

Bingo! Happened to stumble upon the most excellent csh bug I knew we'd had

    Csh does not rely on PWD.  Instead, it does a getcwd all by itself.
    It then prepends 'set cwd =' and evals the result.  Thus interesting
    things happen if the directory happens to have any csh metacharacters
    in its name:

Nicholas Clark
Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About