develooper Front page | perl.perl6.language.regex | Postings from January 2001

Re: Exposing regexp engine & compiled regexp's

Thread Previous | Thread Next
Filipe Brandenburger
January 8, 2001 04:17
Re: Exposing regexp engine & compiled regexp's
Message ID:

Damian wrote: 
> I once wrote a C++-based regex engine (much simpler than Perl's!) 
> just like this. 
> Knowing why a regex failed *is* invaluable when matching regexes 
> against file streams, but there are more possibilities than you 
> mentioned: 
>              "Failed"        Did not match because of illegal transition 
>              "Short"         Did not match: did not reach acceptor state 
>              "Exact"         Matched and finished in an acceptor state 
>              "Long"          Passed through an acceptor state, continued 
> 			     match, but did not finish in an acceptor state 
>              "LongFailed"    Passed through an acceptor state, continued 
> 			     match, but then found an illegal transition 
> Ultimately, I decided that what was needed wasn't insight into the cause 
> of failure, but rather the chance to provide more data to "feed" the 
> engine so it doesn't have to fail "Short" or "Long". That's why I 
> proposed RFC 93 ( instead of a mechanism 
> such as you have suggested. 
> Damian 

Good points, Damian. 

I read your RFC 93. It mentions using a sub to read from the string. I just 
think it uses the sub in two conflicting ways, one for requesting more data 
from the stream and other for telling there was a match. I thought, too, 
that requesting it to return _exactly_ the number of characters that was 
requested goes against most unix syscalls convention (like read...), where 
it's requested to read at most that number of characters. 

What I think is that it could be handled by a OO module. Suppose there's how 
to hook into the regexp engine guts, getting responses as the ones you 
mentioned above. One could write a OO module, with methods for reading more 
data, checking end of data, and acknowledging a failed or succeeded match. 
Then, it could overload the =~ operator, making the regexp engine call the 
module's methods instead of its own's. 

Then, what you proposed in RFC 93 through 

    sub { ... } =~ m/.../; 

could be handled by 

    my $mymatch = MyClassForMatchingFromFileHandles->new($myhandle); 
    $mymatch =~ m/.../; 

What I mean is, by exposing the guts of the regexp engine, we could 
implement all that's wanted in RFC 93, with a cleaner interface, and even do 
more, because we can hook up every call to the regexp engine! 

BTW, if you have a C++-based regexp engine with a clean design, couldn't we 
use it as a base to a new regexp engine that supports current (or new) 
perl's regexp syntax and features and has its guts exposed? 


Oi! Você quer um iG-mail gratuito?
Então clique aqui:

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About