develooper Front page | perl.perl5.porters | Postings from January 2007

Re: New release ?

Thread Previous | Thread Next
From:
demerphq
Date:
January 9, 2007 07:55
Subject:
Re: New release ?
Message ID:
9b18b3110701090754h5c07a0dbp2581662b4b7603ec@mail.gmail.com
On 1/5/07, demerphq <demerphq@gmail.com> wrote:
> On 1/5/07, Rafael Garcia-Suarez <rgarciasuarez@gmail.com> wrote:
> > I'm trying to make a list of what needs to be sorted out before a new
> > development release: (5.9.5, which should be feature-equal to 5.10.0)
> >
> [...]
> > - more regexp work ? (Yves)
> [...]
>
> I have some core patches pending submission that should wrap up the
> pluggable regex engine interface, as well as reduce memory consumption
> required for storing stringified regexes.
>
> The only other work I can think of right now is some stuff relating to
> $REGMARK and $REGERROR but ill post a follow up to explain more when i
> have some more time.
>
> Uhm, there /was/ something else but right now I cant remember. :-)

To follow up on this I believe there are the following regex related
issues that should be resolved before we release perl 5.10

1. Stringification of regexes currently assumes that all patterns can
be wrapped in (?ms-ix:) type containers when concatenated. This needs
to be changed to be an operation performed by the engine.
2. The stringified form is stored in magic structures and not the
regexp structure itself. This means that the pattern text is
potentially duplicated or more in memory by stringification. And makes
certain operations (like finding out what pattern is used in PL_curpm)
difficult or needing C&P to accomplish.
3. The modifiers are hard coded into toke.c, IMO parsing of modifiers
should be handled by the regex engine.
4. The offsets code is now somewhat broken. It makes assumptions that
are not longer correct,  (such as every opcode can be related to a
single contiguous piece of text in the input pattern) and is
maintained (apparently without being used) for all patterns. I believe
that at minimum the offsets logic should only run under debug. It
/could/ be useful for some forms of error reporting, but afaict it is
not, and im not sure if its worth it anyway.
5. Id like to make $REGERROR and $REGMARK into magic vars. I think in
hindsight the rationale for making them package scoped vars was not
sound.
6. I have a patch in working on right now to add \v and \V as
shortcuts for (*PRUNE) and (*SKIP) as suggested by Jeffery Friedl, and
also adds \K and \F as suggested by Jeff Pinyan (japhy) some time
back. I think all four have potential utility that we should take
advantage of.
7. Related to point 3 is that i'd really like to get a new modifier /k
(for 'keep') which would cause the copy on match behaviour to occur
for a specific pattern regardless as to whether it contained capture
buffers or whatnot, and then provide $^REG_STRING,
$^REG_LEFT,$^REG_MATCH,$^REG_RIGHT  to provide access to the copy and
equivelents to $`,$&,$' but which would NOT set PL_sawampersand. Thus,
you could do:

  if ($_=~/blah/k) {
     print "$^REG_LEFT,$^REG_MATCH,$^REG_RIGHT";
  }

and have it behave equivelently to


  if ($_=~/blah/) {
     print "$`,$&,$'";
  }

except that the former would avoid the overall performance penalty of
the latter. I think this would be a really useful change and pretty
good ROI in terms of dev time.

Cheers,
yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About