develooper Front page | perl.perl5.porters | Postings from December 2014

All I want for Christmas is: Streaming Regexps

Thread Next
From:
Paul "LeoNerd" Evans
Date:
December 25, 2014 15:02
Subject:
All I want for Christmas is: Streaming Regexps
Message ID:
20141225150218.0df19f9b@shy.leonerd.org.uk
Given a string, $str, and a regexp, $re, I know that the regexp does
not currently match:

  ok( not $str =~ $re );

However, currently Perl has no way to let me distinguish the two very
importantly different cases of:

  1) $str contains characters that cause $re not to match
  2) $str does not contain enough characters to cause $re to match

For example, consider

  $str = '"here is';
  $re  = qr/^"[^"]+"/;

Currently $str does not match $re, but that's only because of a lack of
characters; if we were to supply more characters, such as from
read()ing more from a file or IO handle, we might find that the regexp
now matches. Alternatively, given

  $str = 'This will never';
  $re  = qr/^\d+/;

It is immediately observed that $str cannot ever match $re, no matter
how much more we read and extend $str with.

I have occasionally observed cases in Parser::MGC where being able to
make this distinction would be really useful - right now it has a
partial attempt at a lazy-streaming mode but that can only operate on
whole blocks separated by ignorable whitespace. It would be really nice
if Parser::MGC could drive a lazy socket read() or similar, to continue
reading input until it matched an entire AST-driven document, of
whatever syntax was being parsed.

-----

In summary:

  I'd like a way to know if a regexp fails to match because it ran out
  of input but was happy until that point, or if it found some bad
  characters that adding more input to will never help.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
http://www.leonerd.org.uk/  |  https://metacpan.org/author/PEVANS

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About