develooper Front page | perl.perl5.porters | Postings from April 2007

Re: Proposed changes and to regular expression interfaces in core

Thread Previous | Thread Next
From:
Ævar Arnfjörð Bjarmason
Date:
April 2, 2007 14:00
Subject:
Re: Proposed changes and to regular expression interfaces in core
Message ID:
51dd1af80704021400t544ab8abm781a326240504774@mail.gmail.com
On 4/2/07, demerphq <demerphq@gmail.com> wrote:
> I have to say im not keen on this.
>
> It seems to me the proper way to do this is to make the routines in
> universal.c dispatch to the appropriate handler in the engine struct.

They'd have to be expanded somewhat, $+{foo} = "bar"; croaks on
Tie::Hash::NamedCapture::STORE but there's no reason why a custom
engine might not want to allow assignments that modify the pattern.
That is, if the regexp_engine struct should contain a complete
Tie::Hash interface.

> BUT, the question I have is: why is this needed? Why is such an
> interface better than actually populating rx->paren_names()?

For the same reason that:

    $re->captures( sub {
        my ($re, $num) = @_;
        return "You got capture #$num";
    } );

Is better than:

    $re->capture_offs(
        [ 0, 5 ],
        [ 5, 9 ],
     );

An interface which requires you to specify a hash of named captures in
advance whose values are dual IV/PV vars with the PV being an embedded
I32* indicating which numbered buffer to use isn't very friendly to
engines that don't work exactly like the perl engine. I'd have to wrap
that like:

    $re->named2numbered(
        bewb => 1,        # map to $1
        boob => [ 3, 4 ]  # map to $3 and $4
    );

Which to take re::engine::PCR (Pugs::Complire::Rule wrapper, still
unreleased) as an example would be a PITA since I'd have to set up
ghost numbered buffers which would show up in @+, the ideal interface
to provide would be:

    $re->named_buff( sub {
         my ($re, $key) = @_;
         $match->{ $key };
    } );

> And actually this question applies retroactively to CALLREG_NAMEDBUF()
> hook as well. Why isnt it sufficient to require compiled regexes
> populate the hash and mandate a  fixed relationship between numbered
> buffers and named buffers?
>
> Im really worried that your thinking is aimed way to much at "how can
> I write an engine with the least hassle" and not enough at "how does
> the engine fit together with the rest of perl".

Guilty as charged, I'm lazy and I'm trying to do crazy things that
don't fit well with the perl regex engine:)

> Please can you explain in more detail what you want to do and why the
> existing interfaces arent enough.

This is needed for instance to wrap Perl 6 regular expression engines
such as the Parrot Grammar Engine and Pugs::Compiler::Rule which have
named captures independent of numbered captures. The current interface
mandates that each named capture be mapped to one or more numbered
capture.

Consider an engine that allowed $+{name}->{subname} instead of
$+{name}->[0]. This could even be used for alternate frontends for the
perl 5.10 engine, I was somewhat annoyed when I found out that
/(?<foo>.*)/ set both $1 and $+{foo}. With pluggable engine frontends
I don't have to be:)

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About