develooper Front page | perl.perl5.porters | Postings from July 2009

Re: Regex inside regex

Thread Previous | Thread Next
From:
Chas. Owens
Date:
July 25, 2009 09:43
Subject:
Re: Regex inside regex
Message ID:
58ce48dc0907250943p64994721o5b444479e11b4f3f@mail.gmail.com
On Sat, Jul 25, 2009 at 12:07, Craig A. Berry<craig.a.berry@gmail.com> wrote:
> On Sat, Jul 25, 2009 at 10:13 AM, Bram<p5p@perl.wizbit.be> wrote:
>> Currently calling a regex inside a regex results in memory corruption in the
>> regex variable.
>>
>> This can result in: segmentation faults, out of memory errors, incorrect
>> values, ... meaning undefined/unexpected behaviour.
>
>
>> Is it possible to turn this into defined/expected behaviour?
>> For example by adding a panic/run time error/... when a regex is being
>> started when a regex is already running? (read: what should be done to
>> accomplish that?)
>
> I don't think we can simply disable a regex within a regex.  Crazy as
> it sounds, it's apparently expected to work. I believe that's the
> whole point of the EVAL case in S_regmatch in regexec.c.  You can see
> that it starts another Perl op with CALLRUNOPS inside of the op that's
> already running.

It doesn't actually sound that crazy.  If you allow arbitrary code to
be called from inside the regex via (?{}) then you are going to wind
up calling a function that uses a regex.  In a weird case of
synchronicity, I ran into this same problem this morning trying to
create a regex that would match any Unicode digit character whose
decimal value was 3.  My first cut used
<blatant_plug>Unicode::Digits</blatant_plug> to get the decimal value:

/(\d)(?(?{3 != digits_to_int $^N})(*FAIL))/

This worked fine until I tried to match two different characters this way

/(\d)(?(?{3 != digits_to_int $^N})(*FAIL))(\d)(?(?{0 != digits_to_int
$^N})(*FAIL))/

Here we should match "30", "\x{1813}\x{1810}", etc., but it only
matches "33" (and other equivalent Unicode digits) because $^N is
still equal to the first match for some reason.  the digits_to_int
function uses regexes, so I assumed that was causing the problem.  I
then switched to

sub digit {
        my $digit    = ord shift;
        my $charinfo = charinfo($digit);
        return $charinfo->{digit};
}

"30" =~ //(\d)(?(?{3 != digit $^N})(*FAIL))(\d)(?(?{0 != digit $^N})(*FAIL))/

Which failed even more miserably (which is odd since digits_to_int is
basically that plus a bunch of error checking).

I played with it for a couple hours more before finally throwing my
hands up in disgust (experimental features probably shouldn't be used
anyway).

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About