develooper Front page | perl.perl6.announce.rfc | Postings from September 2000

RFC 317 (v2) Access to optimisation information for regular expressions

From:
Perl6 RFC Librarian
Date:
September 30, 2000 23:32
Subject:
RFC 317 (v2) Access to optimisation information for regular expressions
Message ID:
20001001062423.700.qmail@tmtowtdi.perl.org
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Access to optimisation information for regular expressions

=head1 VERSION

  Maintainer: Hugo van der Sanden (hv@crypt0.demon.co.uk)
  Date: 25 Sep 2000
  Last Modified: 30 Sep 2000
  Mailing List: perl6-language-regex@perl.org
  Number: 317
  Version: 2
  Status: Frozen

=head1 NOTES ON FREEZING

No comments to this except for a reference from RFC 72 (v4), which
hopes that the concept will be extended to permit the caller to
set study()-type information.

If/when time permits I'll try a patch to perl5 to see how easy it
is and to discover whether anyone other than Peter and I want it.

=head1 ABSTRACT

Currently you can see optimisation information for a regexp only
by running with -Dr in a debugging perl and looking at STDERR.
There should be an interface that allows us to read this information
programmatically and possibly to alter it.

=head1 DESCRIPTION

At its core, the regular expression matcher knows how to check
whether a pattern matches a string starting at a particular location.
When the regular expression is compiled, perl may also look for
optimisation information that can be used to rule out some or all
of the possible starting locations in advance.

Currently you can find out about the optimisation information
captured for a particular regexp only in a perl built with
DEBUGGING, by turning on -Dr:

  % perl -Dr -e 'qr{test.*pattern}'
  Compiling REx `test.*pattern'
  size 8 first at 1
  rarest char p at 0
  rarest char s at 2
     1: EXACT <test>(3)
     3: STAR(5)
     4:   REG_ANY(0)
     5: EXACT <pattern>(8)
     8: END(0)
  anchored `test' at 0 floating `pattern' at 4..2147483647 (checking floating) minlen 11 
  Omitting $` $& $' support.
  
  EXECUTING...
  
  Freeing REx: `test.*pattern'
  %

For some purposes it would help to be able to get at this information
programmatically: the test suite could take advantage of this (to test
that optimisations occur as expected), and it could also be useful for
enhanced development tools, such as a graphical regexp debugger.

Additionally there are times that the programmer is able to supply
optimisation that the regexp engine cannot discover for itself. While
we could consider making it possible to modify these values, it is
important to remember that these are only hints: the regexp engine
is free to ignore them. So there is a danger that people will misuse
writable optimisation information to move part of the logic out of
the regexp, and then blame us when it breaks.

Suggested example usage:

  % perl -wl
  use re;
  $a = qr{test.*pattern};
  print join ':', $a->fixed_string, $a->floating_string, $a->minlen;
  __END__
  test:pattern:11
  %

.. but perhaps a single new method returning a hashref would be
cleaner and more extensible:

  $opt = $a->optimisation;
  print join ':', @$opt{qw/ fixed_string floating_string minlen /};

=head1 IMPLEMENTATION

Straightforward: add interface functions within the perl core to give
access to read and/or write the optimisation values; add methods in
re.pm that use XS code to reach the internal functions.

=head1 REFERENCES

Prompted by discussion of RFC 72: The regexp engine should go backward
as well as forward.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About