develooper Front page | perl.perl5.porters | Postings from July 2008

Re: re-engine with Perl 5.10

Thread Previous
From:
=?UTF-8?Q?=C3=86var_Arnfj=C3=B6r=C3=B0_Bjarmason?=
Date:
July 30, 2008 00:55
Subject:
Re: re-engine with Perl 5.10
Message ID:
51dd1af80807300054o6c19922br34e5495ea404a02c@mail.gmail.com
(I'm cc-ing this to perl5-porters because this is probably something
everyone writing a re::engine is going to run into at one time or
another and it would be nice to have the solution in a public archive)

On Tue, Jul 29, 2008 at 11:59 AM, Fran├žois Perrad
<francois.perrad@gadz.org> wrote:
> I work on a re::engine::Lua module. I've already uploaded a first release on
> CPAN. And this development is hosted at
> http://code.google.com/p/re-engine-lua/ .
>
> Basically , I used your modules (code and test suite) as template, and I
> read perlreapi document.
> Now, I get the following issues :
> 1) match & substitute work, but split gives the message :
>        panic: sv_setpvn called with negative strlen
> 2) the g modifier (see t/s.t) causes a segmentation fault

Both of these problems are caused by you not setting
rx->offs[0].start/end to the correct values, I patched your code a bit
to generate some more debug output:

@@ -383,6 +384,8 @@
             rx->offs[0].start = s1 - ms.src_init;
             rx->offs[0].end   = res - ms.src_init;

+            warn("start/end: %d/%d", rx->offs[0].start,
rx->offs[0].end);
+
 #ifdef DEBUG
             warn("match (%d) [%d-%d]\n", ms.level, rx->offs[0].start,
rx->offs[0].end);
 #endif

And here's how your module and ::POSIX handle it:

sh-3.2$ perl -Mblib -Mre::engine::Lua -wle 'my ($a, $b, $c) = split
/:/, ":::";' 2>&1 | egrep ^"(panic|start)"
start/end: 0/1 at -e line 1.
start/end: 0/1 at -e line 1.
panic: sv_setpvn called with negative strlen at -e line 1.

sh-3.2$ perl -Mblib -Mre::engine::POSIX -wle 'my ($a, $b, $c) = split
/:/, ":::";' 2>&1
start/end: 0/1 at -e line 1.
start/end: 1/2 at -e line 1.
start/end: 2/3 at -e line 1.

As you can see each regex match needs to advance the position it's
handling, this is true of both split and /g since they both work by
calling the exec callback repeatedly. And rx->offs is what is used to
keep track of where it left off, so you have to make sure it's always
consistent or you'll run into internal errors.

Another thing I noticed is that you have this in your exec method:

            rx->nparens = rx->lastparen = rx->lastcloseparen = ms.level;
            Newxz(rx->offs, ms.level + 1, regexp_paren_pair);

I allocate rx->offs in the comp method in all my engines and assign to
nparens there. I don't remember if that's required but I wouldn't be
surprised if you were to run into problems because of this. The
perlreapi interface essentially consists of you setting a bunch of
variables which the core will read at its discretion. If you don't set
some of them or set them at the wrong time (e.g. in exec not comp)
that might cause something to panic later on.

Again, I don't know if *this* specific issue is going to be a problem
but this is something you should keep in mind if you get strange
errors, I spent a day debugging some odd issue in a re::engine which
as it turned out was caused by me forgetting to set some seemingly
unrelated variable that caused perl to run amok.

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About