develooper Front page | perl.perl5.porters | Postings from March 2007

Proposed changes and to regular expression interfaces in core

Thread Next
From:
Ævar Arnfjörð Bjarmason
Date:
March 28, 2007 19:28
Subject:
Proposed changes and to regular expression interfaces in core
Message ID:
51dd1af80703281928j7596185cse7b28ff48d6be01a@mail.gmail.com
I've recently been working on writing re::engine::* modules, some of
which are on CPAN already under AVAR. The following are things that
I'd like to see fixed in core before 5.10 is out so that I can make
more complete regex engine wrappers that aren't limited to how perl
happens to behave.

== The package qr// is blessed into

I've already submitted a patch that allows the user to change this via
%^H, however as dmq pointed out it should be done via a regexp
callback function since it's engine specific:

    SV * Perl_reg_qr_package(pTHX_ const REGEXP * const rx) {
        return newSVpvs("Regexp");
    }

This callback would then be called by Perl_pp_qr to figure out what
package it should bless a C<< qr// >> construct into (after the regex
is compiled).

== re::* Functions in universial.c

universal.c contains functions specific to the perl regex engine,
these were moved from re.xs as part of change 30517 (
http://www.nntp.perl.org/group/perl.perl5.changes/2007/03/msg18346.html
) so that miniperl could access them without loading dynamic modules,
they are used by Tie::Hash::NamedCapture which is called called
automatically by Perl_gv_fetchpvn_flags when C<< %+ >> and C<< %- >>
need to be accessed:

  $ perl5.9.5 -E 'BEGIN { @INC = () } %+'
  Can't locate Tie/Hash/NamedCapture.pm in @INC (@INC contains:) at -e line 1.

Them being in universal.c is not an issue, but rather that any engine
wishing to alter how named capture buffers work will have perl rx
engine semantics imposed on it when %+ and %- are accessed. This
should probably be a callback like the qr// package name.

Trailing from the issue, but is there any reason for why these
functions can't be in NamedCapture.xs which would get compiled and the
resulting object files used for the final (mini)perl
executable. That's more of a cleanup job, and universial is already a
bit fat for what it's supposed to do:)

To trail even further XS(XS_re_is_regexp) Appears to be dead code, was
used in Tie::Hash::NamedCapture::TIEHASH originally which has since
been changed not to use it. But I digress, a lot:)

== Separation of named and numbered match buffers

The way numbered buffers are currently implemented means I can do (in
re::engine::Plugin):

    $re->captures(
        sub {
            my ($re, $n) = @_;
            "Buffer #$n";
        }
    );

    ...

    $123; # returns "Buffer #123";

However I've been unable to do the equivalent thing for C<<
$+{some_name} >> unless I set up the C<< rx->paren_names >> hash in
advance and mapped to a key or keys by having its value be a IV/PV
dualvar. However looking at it a bit better now I can't see any reason
for why that would remain a problem if I can replace %+ and %- as
defined by Tie::Hash::NamedCapture with by own as discussed above.

== Other issues?

None of the wrappers I've written are complete at the moment, I
recently got s///g almost-working (see re::engine::Plan9 failiures)
and split /re/, $str; works in none of my modules. However there
shouldn't be any major major issues left that I haven't discovered.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About