develooper Front page | perl.perl6.language.regex | Postings from January 2001

Re: Exposing regexp engine & compiled regexp's

Thread Previous | Thread Next
January 8, 2001 19:16
Re: Exposing regexp engine & compiled regexp's
Message ID:
Damian Conway wrote:

> # As Branden proposes:
> package From_STDIN;
> sub new       { bless $_[1], $_[0] }
> sub MORE_DATA { $_[0]->getn($_[1]) }
> sub ON_FAIL   { $_[0]->pushback($_[1]) }
> use overload "=~" => 1;
> package main;
> From_STDIN->new($fh) =~ /pat/;
>Hmmmm. Potentially more flexible, but also much more ponderous.

Sorry I didn't include code the first time, but actually my idea is about
much more flexibility than having MORE_DATA and ON_FAIL methods
in an object with overloaded ``=~'' Actually, I think the whole interface of
the regex engine should be exposed to Perl, so that someone could
write an OO package with ``virtual methods'' MORE_DATA and ON_FAIL
and manage the guts of the engine so that it behaves like expected.

Something like:

        package RegexBase;

        use overload '=~' => \&match;

        sub match {    # here is the brains of the class
            # something involving:
            # - the guts of the interface of the Regex Engine
            # - the MORE_DATA method when data is needed
            # - the ON_FAIL method when a match is failed

        package From_STDIN;
        @ISA = qw(RegexBase);
        sub MORE_DATA { ... }
        sub ON_FAIL { ... }

What I'm trying to say is that this is the most flexible we can do.
If one wants to check success or failure, he can, if he wants to
see if there was a state of Failed/Short/Exact/Long/LongFailed
(words from your message), he can too. If he wants to inspect
in which state the NFA stopped, he also can. Virtually anything
that involves Regexp's can be built from there up. I think your `sub'
idea, althought a bit confuse and having more than one significate
to the same sub, is the most common case, and I think it should
be implemented yes.

The only thing I remark is that I believe all of Perl should be the
most exposed possible, so that unseen levels of introspection
can be achieved. In that philosophy I wrote my idea about
exposing the engine's guts.

I know it's heavy to do things like I say. 1st: that's the price of the
flexibility it gives (althought the two approaches can be safely
implemented, they are complementary, not conflictant) and 2nd:
those are thought to be used by module writers, for problems
that can't be solved now in a good way, and where all the complexity
it introduces would still be a big win, compared with the way it could
be handled in perl5 (namely, reading the whole file in memory or
breaking a regexp that would match more than one line, without
mentioning the spaghetti flow control turns in!!!)

BTW, I didn't see any comments about my second thought, the one
of inspecting compiled regexps. Did you like it? That also goes in
the direction of exposing all the internals to the module writers...


Get your free email from AltaVista at

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About