Front page | perl.perl5.porters |
Postings from July 2013
Re: refactoring of regex execution / calling
Thread Previous
|
Thread Next
From:
demerphq
Date:
July 31, 2013 00:37
Subject:
Re: refactoring of regex execution / calling
Message ID:
CANgJU+ULYVhq84S26nRb39eu8hgKSrxfXqCRkveKYmFJfWdsMg@mail.gmail.com
On 30 July 2013 22:20, Dave Mitchell <davem@iabyn.com> wrote:
> I pushed this merge commit a couple of days ago. It's fairly
> self-explanatory. It was originally an attempt to fix intuit-only matches
> under COW, and grew into a 50 commit monster.
>
> commit e82485c19c70d922047c43d035a5e59a7c08ce67
> Merge: 8088f39 2bfbe30
> Author: David Mitchell <davem@iabyn.com>
> AuthorDate: Sun Jul 28 14:09:44 2013 +0100
> Commit: David Mitchell <davem@iabyn.com>
> CommitDate: Sun Jul 28 14:09:44 2013 +0100
>
> [MERGE] refactor pp_match(), pp_subst(), regexec()
>
> Notionally the regexec engine has a well-defined API.
> In practice, the caller of regexec() (typically pp_match() or pp_subst()),
> is required to do a lot of set-up before calling regexec(), and some
> post-processing afterwards; in particular to handle \G, to handle intuit,
> and to set up $& correctly after an intuit-only match.
>
> The series of commits in this branch refactors the code around these three
> functions so that all the regex "knowledge" is now contained within
> regexec() rather than in the calling pp functions. At the same, time the
> pp functions have been heavily cleaned up and simplified where possible.
> This reduces the LOC in pp_match() from 305 to 186.
>
> The most visible refactorisation changes are that:
>
> * the call to intuit is now done from regexec() rather than from pp*;
>
> * ditto the setting of $& on intuit-only matches;
>
> * all the extra setup for \G is now in a single block of code in regexec(),
> rather than being distributed haphazardly across all 3 functions;
>
> Along the way various things have been improved and bugs have been fixed:
>
> * intuit-only matches had been inadvertently disabled when COW was enabled;
> this now fixed. (An intuit-only match is where intuit finding a suitable
> start position is sufficient to determine that the pattern has matched,
> e.g. a fixed string pattern /abc/ without captures);
>
> * intuit-only substitutions had never been enabled; they are now;
> e.g /s/foo/bar/g
>
> * formerly, intuit was skipped in the presence of anchored \G; this is no
> longer the case, so that something like "aaaa" =~ /\G.*xx/ now fails
> quickly due to the missing "xx";
>
> * the COW code will try to reuse the COW copy SV on subsequent captures on
> the same regex and string, rather than freeing and reallocating.
>
> * substitutions will no longer permit themselves to iterate "backwards",
> e.g. with s/.(?=.\G)/x/g;
>
> * some obscure utf8 issues with s/// have been fixed;
>
> * some bugs with \G fixed (and probably new ones added)
Havent looked at the patch yet, but this mail fills me with warmth and joy.
Thanks Dave.
Yves
--
perl -Mre=debug -e "/just|another|perl|hacker/"
Thread Previous
|
Thread Next