develooper Front page | perl.perl6.language.regex | Postings from January 2001

Re: Exposing regexp engine & compiled regexp's

Thread Previous | Thread Next
From:
Filipe Brandenburger
Date:
January 8, 2001 04:17
Subject:
Re: Exposing regexp engine & compiled regexp's
Message ID:
perl.perl6.language.regex-582@nntp.perl.org

Damian wrote: 
> 
> I once wrote a C++-based regex engine (much simpler than Perl's!) 
> just like this. 
> 
> Knowing why a regex failed *is* invaluable when matching regexes 
> against file streams, but there are more possibilities than you 
> mentioned: 
> 
>              "Failed"        Did not match because of illegal transition 
> 
>              "Short"         Did not match: did not reach acceptor state 
> 
>              "Exact"         Matched and finished in an acceptor state 
> 
>              "Long"          Passed through an acceptor state, continued 
to 
> 			     match, but did not finish in an acceptor state 
> 
>              "LongFailed"    Passed through an acceptor state, continued 
to 
> 			     match, but then found an illegal transition 
> 
> Ultimately, I decided that what was needed wasn't insight into the cause 
> of failure, but rather the chance to provide more data to "feed" the 
> engine so it doesn't have to fail "Short" or "Long". That's why I 
> proposed RFC 93 (http://dev.perl.org/rfc/93.html) instead of a mechanism 
> such as you have suggested. 
> 
> Damian 
> 


Good points, Damian. 

I read your RFC 93. It mentions using a sub to read from the string. I just 
think it uses the sub in two conflicting ways, one for requesting more data 
from the stream and other for telling there was a match. I thought, too, 
that requesting it to return _exactly_ the number of characters that was 
requested goes against most unix syscalls convention (like read...), where 
it's requested to read at most that number of characters. 

What I think is that it could be handled by a OO module. Suppose there's how 
to hook into the regexp engine guts, getting responses as the ones you 
mentioned above. One could write a OO module, with methods for reading more 
data, checking end of data, and acknowledging a failed or succeeded match. 
Then, it could overload the =~ operator, making the regexp engine call the 
module's methods instead of its own's. 

Then, what you proposed in RFC 93 through 

    sub { ... } =~ m/.../; 

could be handled by 

    my $mymatch = MyClassForMatchingFromFileHandles->new($myhandle); 
    $mymatch =~ m/.../; 

What I mean is, by exposing the guts of the regexp engine, we could 
implement all that's wanted in RFC 93, with a cleaner interface, and even do 
more, because we can hook up every call to the regexp engine! 

BTW, if you have a C++-based regexp engine with a clean design, couldn't we 
use it as a base to a new regexp engine that supports current (or new) 
perl's regexp syntax and features and has its guts exposed? 


Branden. 

_________________________________________________________
Oi! Você quer um iG-mail gratuito?
Então clique aqui: http://www.ig.com.br/paginas/assineigmail.html


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About