develooper Front page | perl.perl5.porters | Postings from April 2007

Re: Proposed changes and to regular expression interfaces in core

Thread Previous | Thread Next
Ævar Arnfjörð Bjarmason
April 10, 2007 12:11
Re: Proposed changes and to regular expression interfaces in core
Message ID:
On 4/10/07, Tels <> wrote:
> But what if you need negative integers, too? Expression signed numbers in an
> unsigned type like size_t is quite awkward.

I'll explain this a bit better if only to get it clear in my own head.

The core engine expresses capture variables with an array of I32
start/end pairs. So a match like:

"bwbs" =~ /(.)(..)/;

Would result in the following structure:

regexp_paren_pair pairs[] = { {0, 3}, { 0, 1}, {1, 3}, { -1, -1 } };

I.e. $& = "bwb"; $1 = "b"; $2 = "wb" and $3 = undef, $4 = undef, ....

The only thing using a I32 gives us here is having -1 mark the end of
capture buffers. dmq had the idea of using 0 instead which of course
would entail expressing what was {0, 3} as {1, 4}. Unless anyone else
has any better ideas. Using a size_t-ish type such as STRLEN would
allow us to have giant capture vars and to wrap regexp libs that allow
them, such as the POSIX regex library.

I32 is also used for functions such as Perl_reg_numbered_buff_fetch()
which take a I32 paren argument indicating what capture buffer should
be retrived. -2 is used for $`, -1 for $' and 0 for $&. This limits
the number of capture vars to around 2**32/2.

In any case other parts of the interface present bigger issues, but
this is one of the things that would be nice to get right the first

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About