develooper Front page | perl.perl5.porters | Postings from June 2015

Re: Bringing the regex compiler into the current millenium.

Thread Previous | Thread Next
From:
Christian Millour
Date:
June 26, 2015 18:01
Subject:
Re: Bringing the regex compiler into the current millenium.
Message ID:
558D9373.7050702@abtela.com
Le 23/10/2014 21:54, demerphq a écrit :
> I added support for maxlen earlier this year as part of working toward
> making $/ support regexes (pretty much the same use case you mention).
> We now set flags to determine if the regex is potentially infinite
> (RXf_UNBOUNDED_QUANTIFIER), or if not we calculate the maxlen. It should
> be in Perl 5.19.9 and later. (maxlen is meaningless when
> RXf_UNBOUNDED_QUANTIFIER is set).
>
> $ ./perl -Ilib -Mre=Debug,OPTIMISE,DUMP,FLAGS -e'/fo+o/'
> Compiling REx "fo+o"
> first:>  1: EXACT <f> (3) [ ]
> Peep>  1: EXACT <f> (3) [ SCF_DO_SUBSTR SCF_DO_STCLASS_AND
> SCF_DO_STCLASS SCF_WHILEM_VISITED_POS ]
>    join>  1: EXACT <f> (3)
> Peep>  3: PLUS (6) [ SCF_DO_SUBSTR SCF_WHILEM_VISITED_POS ]
>    Peep>  4: EXACT <o> (0) [ SCF_DO_SUBSTR SCF_WHILEM_VISITED_POS ]
>      join>  4: EXACT <o> (0)
> Peep>  6: EXACT <o> (8) [ SCF_DO_SUBSTR SCF_WHILEM_VISITED_POS ]
>    join>  6: EXACT <o> (8)
> minlen: 3 r->minlen:0 maxlen:0
> Final program:
>     1: EXACT <f> (3)
>     3: PLUS (6)
>     4:   EXACT <o> (0)
>     6: EXACT <o> (8)
>     8: END (0)
> anchored "fo" at 0 floating "oo" at 1..9223372036854775807 (checking
> floating) minlen 3
> r->extflags: UNBOUNDED_QUANTIFIER_SEEN USE_INTUIT_NOML USE_INTUIT_ML
> r->intflags: [none-set]
> Freeing REx: "fo+o"
>
> $ ./perl -Ilib -Mre=Debug,OPTIMISE,DUMP,FLAGS -e'/foo/'
> Compiling REx "foo"
> first:>  1: EXACT <foo> (3) [ ]
> Peep>  1: EXACT <foo> (3) [ SCF_DO_SUBSTR SCF_DO_STCLASS_AND
> SCF_DO_STCLASS SCF_WHILEM_VISITED_POS ]
>    join>  1: EXACT <foo> (3)
> minlen: 3 r->minlen:0 maxlen:3
> Final program:
>     1: EXACT <foo> (3)
>     3: END (0)
> anchored "foo" at 0 (checking anchored isall) minlen 3
> r->extflags: CHECK_ALL USE_INTUIT_NOML USE_INTUIT_ML
> r->intflags: [none-set]
> Freeing REx: "foo"
>
>
> (some of that output is specific to blead, the relevant parts are in 5.20).
>
> cheers,
> Yves
>
> --
> perl -Mre=debug -e "/just|another|perl|hacker/"

Hi,

do you have any ETA wrt. "making $/ support regexes" which would be 
awesome ?

In the meantime, is it OK to access a regexp's minlen and maxlen from 
perl code ? The only documented functions/macros documented in perlapi 
wrt. regular expressions are SvRX and SvRXOK.

What I have in mind is something like

---8<----8<----8<----8<----8<----8<----8<----
use 5.020;
use strict;
use warnings;
package Regexp::Len;

require Exporter;
our @ISA = qw( Exporter );
our @EXPORT;
our @EXPORT_OK = qw(regexp_minlen regexp_maxlen);
our %EXPORT_TAGS = (all => \@EXPORT_OK, default => \@EXPORT);

use Inline C => << "EOC";
int regexp_minlen (SV* re) {
     if (!SvRXOK(re)) {
	croak("not a regexp");
     }
     return RX_MINLEN(SvRX(re));
}

int regexp_maxlen (SV* re) {
     if (!SvRXOK(re)) {
	croak("not a regexp");
     }
     return ReANY(SvRX(re))->maxlen;
}
EOC
     ;
1
---8<----8<----8<----8<----8<----8<----8<----

which works just fine right now (strawberry perl portable 5.22)
$ perl -Ilib -MRegexp::Len=:all -E'$r = qr/a{1,2}b{3,4}/; say "$r: ", 
regexp_minlen($r), "..", regexp_maxlen($r)'
(?^u:a{1,2}b{3,4}): 4..6
$

Two questions though :
1) is there a reason why RX_MAXLEN is not #define'd in regexp.h, on the 
model of RX_MINLEN :
#define RX_MINLEN(prog)		(ReANY(prog)->minlen)
2) how bad is it to use undocumented APIs as above ?


TIA,

--Christian


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About