Front page | perl.perl5.porters |
Postings from July 2009
Re: Regex inside regex
Thread Previous
|
Thread Next
From:
Chas. Owens
Date:
July 25, 2009 09:43
Subject:
Re: Regex inside regex
Message ID:
58ce48dc0907250943p64994721o5b444479e11b4f3f@mail.gmail.com
On Sat, Jul 25, 2009 at 12:07, Craig A. Berry<craig.a.berry@gmail.com> wrote:
> On Sat, Jul 25, 2009 at 10:13 AM, Bram<p5p@perl.wizbit.be> wrote:
>> Currently calling a regex inside a regex results in memory corruption in the
>> regex variable.
>>
>> This can result in: segmentation faults, out of memory errors, incorrect
>> values, ... meaning undefined/unexpected behaviour.
>
>
>> Is it possible to turn this into defined/expected behaviour?
>> For example by adding a panic/run time error/... when a regex is being
>> started when a regex is already running? (read: what should be done to
>> accomplish that?)
>
> I don't think we can simply disable a regex within a regex. Crazy as
> it sounds, it's apparently expected to work. I believe that's the
> whole point of the EVAL case in S_regmatch in regexec.c. You can see
> that it starts another Perl op with CALLRUNOPS inside of the op that's
> already running.
It doesn't actually sound that crazy. If you allow arbitrary code to
be called from inside the regex via (?{}) then you are going to wind
up calling a function that uses a regex. In a weird case of
synchronicity, I ran into this same problem this morning trying to
create a regex that would match any Unicode digit character whose
decimal value was 3. My first cut used
<blatant_plug>Unicode::Digits</blatant_plug> to get the decimal value:
/(\d)(?(?{3 != digits_to_int $^N})(*FAIL))/
This worked fine until I tried to match two different characters this way
/(\d)(?(?{3 != digits_to_int $^N})(*FAIL))(\d)(?(?{0 != digits_to_int
$^N})(*FAIL))/
Here we should match "30", "\x{1813}\x{1810}", etc., but it only
matches "33" (and other equivalent Unicode digits) because $^N is
still equal to the first match for some reason. the digits_to_int
function uses regexes, so I assumed that was causing the problem. I
then switched to
sub digit {
my $digit = ord shift;
my $charinfo = charinfo($digit);
return $charinfo->{digit};
}
"30" =~ //(\d)(?(?{3 != digit $^N})(*FAIL))(\d)(?(?{0 != digit $^N})(*FAIL))/
Which failed even more miserably (which is odd since digits_to_int is
basically that plus a bunch of error checking).
I played with it for a couple hours more before finally throwing my
hands up in disgust (experimental features probably shouldn't be used
anyway).
--
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.
Thread Previous
|
Thread Next