Front page | perl.perl5.porters |
Postings from August 2002
Re: Ideas for 5.10
Thread Previous
|
Thread Next
From:
Benjamin Goldberg
Date:
August 9, 2002 13:58
Subject:
Re: Ideas for 5.10
Message ID:
3D542D1E.8B1EAD08@earthlink.net
Nicholas Clark wrote:
>
> On Fri, Aug 09, 2002 at 01:15:15AM -0400, Benjamin Goldberg wrote:
> > Arthur Bergman wrote:
> > >
> > > Hi,
> > >
> > > Here is a list of things that I would like to see in 5.10
> >
> > Another think I would like to see would be to allow $/ to be a qr//
> > regex.
>
> But you'd break one entry in perlfaq6:
>
> =head2 I put a regular expression into $/ but it didn't work. What's
> wrong?
>
> $/ must be a string, not a regular expression. Awk has to be better
> for something. :-)
>
> And then what would awk be good for?
>
> I'd like to see this, but without a rewrite of the regexp engine to
> allow the engine to accept incomplete strings with an associated "get
> more" function I cannot see how it could be implemented for the
> general case without internally unsetting $/, slurping the file, and
> then finding the regexp.
Actually, I wasn't quite suggesting that...
Attempt to match the data that's buffered for input against the regex,
and if doesn't suceed, the perlio part would perform another read call,
not the regex engine. In other words, something like:
until( $buf =~ $regex ) {
sysread $input_handle, $buf, blah, length $buf;
}
return substr( $buf, 0, $+[0], "" );
It might cache $-[0], so that a later chomp() removes from $-[0] to the
end of the string. (I'm not sure on this though, since I'm not sure
how/where one would cache this -- the purpose of caching it is to avoid
having to do another regex match for chomp)
> However, IIRC Hugo was planning on regexp engine re-writing. I guess
> all it actually needs is a trigger when the regexp engine hits the end
> of the string to call an optionally supplied function to get more
> string, and then repeat. The true end is only reached when the get
> more function returns "no more", which for files would be EOF.
>
> However, this idea is not simple if the regexp engine in all its
> backtracking is keeping absolute pointers to match points in the
> string (and expecting a string contiguous in memory) as there's no way
> a "get more" function could reallocate a buffer larger, or hang the
> extra into another buffer (making a discontinuous string)
I can see little need for a "get more" hook at the present time.
All the work needed for putting a regex in $/ can be done by perlio; no
serious changes to the regex engine are needed.
The only change *I* can think of that we might want to add for this is
to add a special EOF anchor, with the following semantics: Before we've
reached EOF, the \z anchor marks the end of the data that's buffered so
far, and the EOF anchor always fails. After we've reached EOF, the EOF
anchor succeeds at the same point that \z succeeds (the end of the
buffer).
--
tr/`4/ /d, print "@{[map --$| ? ucfirst lc : lc, split]},\n" for
pack 'u', pack 'H*', 'ab5cf4021bafd28972030972b00a218eb9720000';
Thread Previous
|
Thread Next