develooper Front page | perl.perl5.porters | Postings from May 2013

DAVEM TPF Grant April 2013 report

Thread Next
From:
Dave Mitchell
Date:
May 1, 2013 15:43
Subject:
DAVEM TPF Grant April 2013 report
Message ID:
20130501154322.GF2216@iabyn.com
This month I worked on three 5.18 blocker tickets; all three being
regressions related to my jumbo re_eval fix back in 5.17.1.

The first, which I continued working on from last month, was the
"Regexp::Grammars" bug.

Basically, my reworking of the /(?{})/ implementation assumed that a
constant string segment like "foo" in /foo..../ would indeed be constant;
but in the presence of

    use overload::constant qr => sub { bless [], ... }

the "constant" can be anything but, including an overloaded or REGEXP
object. So the concatenation of the pattern's string segments didn't
handle all the extra stuff like doing overloading properly or extracting
out pre-compiled code blocks from qr// objects.

This is now fixed.


The second issue concerned handling arrays embedded within literal
regexes, e.g. /...@a.../. This was partially to fix a regression from
5.16.x where, if @a contained a qr/...(?{...}).../, then suddenly you'd
need a 'use re eval' where you didn't need one before: RT #115004.  But it
also enhances the behaviour of array interpolation relative to 5.16.x too,
especially relating to closures and overloading.

Basically, the traditional behaviour of run-time patterns such as
/a${b}c/ was to concatenate the pattern components together, then pass it
to the regex engine. My 5.17.1 jumbo re_eval fix changed that so that the
list of args was preserved and passed as-is to the regex engine.  This
meant that the engine could do things like extract out existing optrees
from code blocks in something like $b = qr/...(?{...}).../, rather than
having to recompile them. So closures work properly.

The thing I missed back then was applying the same new handling to arrays
as well as scalars. Until my fix, /a@{b}c/ would be parsed as

    regcomp('a', join($", @b), 'c')

This meant that the array was flattened and its contents stringified before
hitting the regex engine.

I've now changed it so that it is parsed as

    regcomp('a', @b, 'c')

(but where the array isn't flattened, but rather just the AV itself is
pushed onto the stack, c.f. push @b, ....).

As well as handling closures properly, it also means that 'qr' overloading
is now handled with interpolated arrays as well as with scalars:

    use overload 'qr' => sub { return  qr/a/ };
    my $o = bless [];
    my @a = ($o);
    "a" =~ /^$o$/; # always worked
    "a" =~ /^@a$/; # now works too

As well as the new handling of arrays, the pattern concatenation code
within Perl_re_op_compile was heavily reworked, resulting in fixing a
utf8 edge case, and generally simplifying the code, including enabling
the removal of a clunky if (0) { label: ... } bit of code.

This issue is now fully fixed.


The third issue concerned how caller() and __SUB__ work within regex code
blocks. It turns out that since my re_eval jumbo fix, code blocks in
literal matches were displaying an extraneous extra stack frame. This
code:

    #!/usr/bin/perl
    use Carp;
    sub f3 { croak() }
    sub f2 { "a" =~ /a(?{f3(3)})/ }
    sub f1 { f2(2) }
    f1(1);

gives the following results:

5.16.3:
        main::f3(3) called at (re_eval 1) line 1
        main::f2(2) called at /home/davem/tmp/p line 6
        main::f1(1) called at /home/davem/tmp/p line 7

5.17.10:
        main::f3(3) called at /home/davem/tmp/p line 5
        main::f2 called at /home/davem/tmp/p line 5
        main::f2(2) called at /home/davem/tmp/p line 6
        main::f1(1) called at /home/davem/tmp/p line 7

blead:
        main::f3(3) called at /home/davem/tmp/p line 5
        main::f2(2) called at /home/davem/tmp/p line 6
        main::f1(1) called at /home/davem/tmp/p line 7

In addition, the __SUB__ token, which returns a reference to the current
subroutine, was returning a ref to the hidden anonymous sub which is
now used to implement closure behaviour correctly for code blocks within
qr//'s; that is,

    $r = qr/foo(?{...})bar/;

is supposed to behave like

    $r = sub { /foo/  && do {...} && /bar/ }

as far as closures are concerned. The trouble is, the anon sub was never
designed to be called directly, and in fact perl SEGVs if you do attempt
to call it. The workaround for this is to skip regex calls on the context
stack when looking for the CV for __SUB__; this has the effect of __SUB__
always returning the sub which executed the pattern match, regardless
of what direct code blocks (/(?{})/), or indirect code blocks (
$r = qr/(?{})/; /a$r/ ) have been called. I have documented this as
subject to change for now.


----------------------------

Over the last month I have averaged 8.8 hours per week

As of 2013/04/30: since the beginning of the grant:

 164.5 weeks
1656.6 total hours
  10.1 average hours per week

There are 43 hours left on the grant.


Report for period 2013/04/01 to 2013/04/30 inclusive

SUMMARY
-------

    Effort (HH::MM):

        3:08 diagnosing bugs
       35:27 fixing bugs
        0:00 reviewing other people's bug fixes
        0:00 reviewing ticket histories
        0:00 review the ticket queue (triage)
       -----
       38:35 TOTAL

    Numbers of tickets closed:

           3 tickets closed that have been worked on
           0 tickets closed related to bugs that have been fixed
           0 tickets closed that were reviewed but not worked on (triage)
       -----
           3 TOTAL


SHORT DETAIL
------------

 7:35 [perl #113928] caller behaving unexpectedly in re-evals
19:33 [perl #115004] perl 5.17.x can't use @var in regexp, but only $var
11:27 [perl #116823] Regexp::Grammars broken since 5.17.1

-- 
"You may not work around any technical limitations in the software"
    -- Windows Vista license

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About