develooper Front page | perl.perl5.porters | Postings from August 2002

Re: qr// in $/ (was Re: Ideas for 5.10)

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
August 16, 2002 16:24
Subject:
Re: qr// in $/ (was Re: Ideas for 5.10)
Message ID:
20020817002403.M97456@plum.flirble.org
On Fri, Aug 16, 2002 at 07:18:21PM -0400, Benjamin Goldberg wrote:
> Nicholas Clark wrote:
> [snip]
> > For regexps such as /^__[A-Z]+__$/ that we know can't match a newline
> > in the middle, we simply look to see if there is a newline in the
> > PerlIO buffer between the file pointer and the end of buffer. If there
> > is, we call the regexp engine, saying that the end of where it can
> > match to is the last newline we know of in the PerlIO buffer.
> >
> > If there isn't a newline in the PerlIO buffer (or the regexp engine
> > fails to find a match) we read more data from disk (or whatever) until
> > we find a newline. (I'm envisaging reading a disk block, then scanning
> > for the last newline, rather than some sort of fgets())
> 
> This seems more like an optomization for the regex engine to do, not the
> PerlIO using it.

But that would mean that the regexp engine would have to know how to
"get more" (in this case by doing IO)
If we do the decision making outside the regexp engine then the regexp
engine needs no change from the present, and there's no recursion danger
(PerlIO calling out to layers written in perl that in turn use regexps)

> > For regexps that we can't figure out whether they are greedy and could
> > backtrack we fall back to reading until EOF and presenting the lot to
> > the regexp engine. And a flag the knowledgeable could add to their
> > regexp saying "I promise this regexp can't backtrack in a greedy way"
> > or whatever is most useful to the PerlIO/regexp system would be
> > useful.
> 
> Turn the 'is it nongreedy' question around ... what happens if we find
> out that the qr regex that was placed in $/ *is* greedy (but can
> backtrack to match a lesser part)?  Do we carp, croak, ignore the
> problem (and possibly match wrong), or do it "right" by deffering
> running the regex until EOF is seen?

Don't know. I'd like to have the "do EOF" option at least available as this
would be most useful for processing disk files that have historically
been processed with undef $/ and a regexp on $_
Now we could readline() them.

But having the croak or carp available to anyone doing qr// $/ for a terminal
or socket or pipe would also be handy. Else they are going to sit forever
and wonder what's going on.

And in that case, most minimally we only need to carp/croak/redo if we
find that we did match after backtracking.

I don't think I've answered your question properly. Sorry.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About