develooper Front page | perl.perl5.porters | Postings from April 2007

Re: Proposed changes and to regular expression interfaces in core

Thread Previous | Thread Next
From:
Tels
Date:
April 10, 2007 12:25
Subject:
Re: Proposed changes and to regular expression interfaces in core
Message ID:
200704102126.42263@bloodgate.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Moin,

On Tuesday 10 April 2007 19:11:44 Ævar Arnfjörð Bjarmason wrote:
> On 4/10/07, Tels <nospam-abuse@bloodgate.com> wrote:
> > But what if you need negative integers, too? Expression signed numbers
> > in an unsigned type like size_t is quite awkward.
>
> I'll explain this a bit better if only to get it clear in my own head.
>
> The core engine expresses capture variables with an array of I32
> start/end pairs. So a match like:
>
> "bwbs" =~ /(.)(..)/;
>
> Would result in the following structure:
>
> regexp_paren_pair pairs[] = { {0, 3}, { 0, 1}, {1, 3}, { -1, -1 } };
>
> I.e. $& = "bwb"; $1 = "b"; $2 = "wb" and $3 = undef, $4 = undef, ....
>
> The only thing using a I32 gives us here is having -1 mark the end of
> capture buffers. dmq had the idea of using 0 instead which of course
> would entail expressing what was {0, 3} as {1, 4}. Unless anyone else
> has any better ideas. 

When working with size_t, you often end up having the need to express -1 as 
it is used for errors. Then you typical use:

	size_t(-1)

This means you are basically limited to 2**N-1 instead of 2**N/2.

> Using a size_t-ish type such as STRLEN would 
> allow us to have giant capture vars and to wrap regexp libs that allow
> them, such as the POSIX regex library.

Sounds cool.

> I32 is also used for functions such as Perl_reg_numbered_buff_fetch()
> which take a I32 paren argument indicating what capture buffer should
> be retrived. -2 is used for $`, -1 for $' and 0 for $&. This limits
> the number of capture vars to around 2**32/2.

Or then 2*64 - 3 :-)

> In any case other parts of the interface present bigger issues, but
> this is one of the things that would be nice to get right the first
> time.

Sorry for the confusion, but I meant that as an XS writer, I would also like 
to have a portable way of saying I64. I know about U8, U32 etc, but I am 
not sure if I64 or U64 exist. If not, it would be cool to have them. If 
they already exist, nvm me.

All the best,

Tels

- -- 
 Signed on Tue Apr 10 21:23:51 2007 with key 0x93B84C15.
 Get one of my photo posters: http://bloodgate.com/posters
 PGP key on http://bloodgate.com/tels.asc or per email.

 "Duke Nukem Forever is a 1999 game and we think that timeframe matches
 very well with what we have planned for the game."

  -- George Broussard, 1998 (http://tinyurl.com/6m8nh)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iQEVAwUBRhwBEncLPEOTuEwVAQJNMQf/U3+XLhTNL/evtUu1LgYbUtjjfVpOL+rp
PnI5/DNwntnt64v1dxd569tMrlsT94vt7H8sIoS1fbPOfkJk336olIThi3dnkQrQ
jc24VBzid1Heo5P2XIMWdBBTuhQqSZQa5386RO67E7y9fS1cCniHNN2/rjPg+wXj
j+0kwSncWJ4KuroJ/mgv3/5go3bvruzOWWWXlPJSNJCLpijkxAqMNzekUNcF2cJ3
40f+uOCnXePxs64tWSESEjzqwk4hmrXdmWDBaJvU/7t9yiPDue5htCpkKkq7Ae/+
hKRr1k03epdMi+X1E6dWgf0ZQFK+mNt/kqJkQQZVWdAsww6lVGeY9A==
=WWTg
-----END PGP SIGNATURE-----

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About