develooper Front page | perl.perl5.porters | Postings from April 2007

Re: Proposed changes and to regular expression interfaces in core

Thread Previous | Thread Next
April 10, 2007 12:25
Re: Proposed changes and to regular expression interfaces in core
Message ID:
Hash: SHA1


On Tuesday 10 April 2007 19:11:44 Ævar Arnfjörð Bjarmason wrote:
> On 4/10/07, Tels <> wrote:
> > But what if you need negative integers, too? Expression signed numbers
> > in an unsigned type like size_t is quite awkward.
> I'll explain this a bit better if only to get it clear in my own head.
> The core engine expresses capture variables with an array of I32
> start/end pairs. So a match like:
> "bwbs" =~ /(.)(..)/;
> Would result in the following structure:
> regexp_paren_pair pairs[] = { {0, 3}, { 0, 1}, {1, 3}, { -1, -1 } };
> I.e. $& = "bwb"; $1 = "b"; $2 = "wb" and $3 = undef, $4 = undef, ....
> The only thing using a I32 gives us here is having -1 mark the end of
> capture buffers. dmq had the idea of using 0 instead which of course
> would entail expressing what was {0, 3} as {1, 4}. Unless anyone else
> has any better ideas. 

When working with size_t, you often end up having the need to express -1 as 
it is used for errors. Then you typical use:


This means you are basically limited to 2**N-1 instead of 2**N/2.

> Using a size_t-ish type such as STRLEN would 
> allow us to have giant capture vars and to wrap regexp libs that allow
> them, such as the POSIX regex library.

Sounds cool.

> I32 is also used for functions such as Perl_reg_numbered_buff_fetch()
> which take a I32 paren argument indicating what capture buffer should
> be retrived. -2 is used for $`, -1 for $' and 0 for $&. This limits
> the number of capture vars to around 2**32/2.

Or then 2*64 - 3 :-)

> In any case other parts of the interface present bigger issues, but
> this is one of the things that would be nice to get right the first
> time.

Sorry for the confusion, but I meant that as an XS writer, I would also like 
to have a portable way of saying I64. I know about U8, U32 etc, but I am 
not sure if I64 or U64 exist. If not, it would be cool to have them. If 
they already exist, nvm me.

All the best,


- -- 
 Signed on Tue Apr 10 21:23:51 2007 with key 0x93B84C15.
 Get one of my photo posters:
 PGP key on or per email.

 "Duke Nukem Forever is a 1999 game and we think that timeframe matches
 very well with what we have planned for the game."

  -- George Broussard, 1998 (
Version: GnuPG v1.4.2 (GNU/Linux)


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About