develooper Front page | perl.perl5.porters | Postings from April 2007

Re: Proposed changes and to regular expression interfaces in core

Thread Previous | Thread Next
From:
Ævar Arnfjörð Bjarmason
Date:
April 9, 2007 21:14
Subject:
Re: Proposed changes and to regular expression interfaces in core
Message ID:
51dd1af80704092114u30d408ay309606b60ba3827@mail.gmail.com
This incomplete patch shows the direction I'm going with this.

I added two new callbacks for numbered capture vars and gave the
existing one a new name. Now there's store/fetch/length for $1 where
store croaks if assigned to. This means that one can write engines
where C<$1 = "one"> works as expected (depending on what you expect of
course). The logic in Perl_magic_len() was moved to a `length'
callaback that can be overridden.

I've also added callbacks to allow the regexp engine itself to
implement a tie interface for %+ and %-. These aren't working at the
moment but when they are the code that's currently in NamedCapture.pm
and universal.c will be part of the regex engine.

I'm going to polish the interface a little, like putting all the
named/numbered callacks in their own struct and make that a member of
the regexp_engine struct. That way other engines can just drop in a
macro to use the default behavior. The protototypes on the callbacks
also need some going over, and I have to look better at magic hashes
so that I can get %+ and %- to work properly like $1 et al.

----

Some further changes I might want to do include changing the comp
prototype from:

(pTHX_ char* exp, char* xend, U32 pm_flags);

to:

(pTHX_ SV**sv, char* exp, char* xend, U32 pm_flags);

This would allow engines to get an array from C< "foo" =~ [ qr/a/,
qr/b/ ] >. It's nice to have the option open not to always have to
stringify qr//. But of course the engine won't always be getting a
scalar so we'll still need exp & xend.

Another thing would be to change regexp_paren_pair to use STRLEN
instead of I32. Capture offsets might go beyond 2**32 bytes.

I was also thinking of adding SvRX* functions/macro to the API. Such
as SvRXOK() and a macro that gets the REGEXP out of a MAGIC SV (this
is duplicated in at least two places). It might also be wise to
provide macros for some of the existing perlreapi.pod stuff so that it
can be changed later while maintaining source-level compatibility.

But once this is all done there should be a pretty sane regex api in
core that re::engine::Plugin and others can target. Comments?:)

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About