develooper Front page | perl.perl5.porters | Postings from March 2013

Re: split patches working now, but it revealed a problem in the regex compilation

Thread Previous | Thread Next
March 26, 2013 11:16
Re: split patches working now, but it revealed a problem in the regex compilation
Message ID:
On 26 March 2013 12:00, Dave Mitchell <> wrote:
> On Tue, Mar 26, 2013 at 08:40:30AM +0100, demerphq wrote:
>> I have pushed a new yves/revert_splitwhite branch.
>> It passes all tests except for some which now fail in re/recompile.t
>> As far as I can tell these failures come from a change I did to fix
>> the problem that the caching logic was using the precompiled string
>> (meaning without "(?...:....)" wrapper) and not checking that the
>> compiled pattern and the uncompiled had the same regex flags.
>> I noticed this testing the behavior of
>> split $_ ? / / : ' ', $string for 1,0,1,0
>> which would not recompile due to the two patterns differing only by flags.
>> My fix is apparently overly pessimistic, and causes perl to recompile
>> things like "\x{100}" more often than we should. This is because the
>> patterns starts off as utf8-off, but ends up as a utf8-on.
>> At worst this means we recompile utf8 patterns more often.
>> Anyway, before I TODO the failing tests I thought I should let Dave M
>> know and see if he can come up with something.
> I'm not really an expert in this area; I just happened to notice during
> the re_eval work that it was buggily caching stuff because it was just
> comparing the pattern bytes, and not the utf8 flag; and further, I found
> found that the recompile logic didn't have any tests so I added a basic
> test file.
> My general opinion is that we should favour correctness over speed; so
> if your fix improves correctness, but disables caching for a few cases,
> then that's fine by me.

I came up with a better patch. Can you please sanity check it for me?

commit aa92869cf6c53d9eb86a54300cc39720259c68cd
Author: Yves Orton <>
Date:   Tue Mar 26 12:09:48 2013 +0100

    preserve the original flags so pattern caching works properly

    This adds a new property to the regexp structure, "compflags",
    and related macros for accessing it. We preserve the original
    flags passed into the compilation process, so we can compare
    when we are trying to decide if we need to recompile.

    Things are a touch tricky as the UTF8 flag is handled specially.


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About