develooper Front page | perl.perl5.porters | Postings from August 2002

Re: Ideas for 5.10

Thread Previous | Thread Next
Nicholas Clark
August 9, 2002 15:26
Re: Ideas for 5.10
Message ID:
On Fri, Aug 09, 2002 at 04:59:10PM -0400, Benjamin Goldberg wrote:
> Nicholas Clark wrote:
> > 
> > On Fri, Aug 09, 2002 at 01:15:15AM -0400, Benjamin Goldberg wrote:

> > > Another think I would like to see would be to allow $/ to be a qr//
> > > regex.

> > I'd like to see this, but without a rewrite of the regexp engine to
> > allow the engine to accept incomplete strings with an associated "get
> > more" function I cannot see how it could be implemented for the
> > general case without internally unsetting $/, slurping the file, and
> > then finding the regexp.
> Actually, I wasn't quite suggesting that...
> Attempt to match the data that's buffered for input against the regex,
> and if doesn't suceed, the perlio part would perform another read call,
> not the regex engine.  In other words, something like:
>    until( $buf =~ $regex ) {
>       sysread $input_handle, $buf, blah, length $buf;
>    }
>    return substr( $buf, 0, $+[0], "" );
> It might cache $-[0], so that a later chomp() removes from $-[0] to the
> end of the string.  (I'm not sure on this though, since I'm not sure
> how/where one would cache this -- the purpose of caching it is to avoid
> having to do another regex match for chomp)

I believe that this would only work for non-greedy regexps.

in the case of

$/ = qr/The .*End/;

then "       The          End                         End   "

should treat the longer match as being the RS. If a buffer boundary happened
about                                ^ here
then the regexp engine would successfully find a shorter match, that would
not be correct. And the position of the match would then vary depending on
the block size used to read the file.

It's probably not unreasonable to limit $/ to non-greedy regexps, because most
of the time they're more likely to do what you want. (fast).
The perl6 folks are going to have to be careful to use non-greedy regexps in
their perl6 grammar, else there is going to be a lot of needless whole script
slurping and much backtracking during the compilation phase.

Nicholas Clark
Even better than the real thing:

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About