develooper Front page | perl.perl5.porters | Postings from June 2021

Re: Benchmarking Pure Perl Trim Functions.

Thread Previous | Thread Next
From:
hv
Date:
June 4, 2021 01:06
Subject:
Re: Benchmarking Pure Perl Trim Functions.
Message ID:
202106040034.1540YTP31881@crypt.org
Nicholas Clark <nick@ccl4.org> wrote:
:More - I assumed that the regexes used to "trim spaces from the right" are
:
:1) simple to recognise
:2) there aren't that many variations of them
:
:So I set off with the goal of recognising them in the regex compiler, and
:then having re_intuit_start() implement them better.
:
:In the branch https://github.com/nwc10/perl5/tree/intiuit-rtrim

Cute stuff, I like it.

It would make things better if the 'goto fail' could be moved from
dccb62dc33 to ae13beca73, to save anyone writing long explanations about
why 'return strpos' is wrong and then deleting them again.

The "Oh my" comment in the same commit is probably better in the commit
message. Also not sure why you talk about setting it "for now" to
`strend - strbeg` - in this case, isn't it simply the right answer?
More important is probably a comment in the 'else' branch that every
other case is fixed length.

121,600 new tests in re/rtrim.t seems like too much overkill to me.
On the other hand it would be a good idea to add tests in re/opt.t
to verify the optimization is kicking in when and as it should.
(Sadly we don't get the RXf flags directly there, I think I couldn't
work out how to get to them in re::optimization().)

I'm not sure why you think extending to /\s+\z/a isn't worth it; on the
face of it is a fairly trivial extension, and I imagine people are as
likely to use it for speed reasons as for semantics.

In the commit messages and some comments it sometimes seems like you
write [[:space:]] to imply /u and \s to imply /d, which seems likely
to lead to confusion. I'd be inclined to use \s throughout.

And we probably need a more extensible mechanism for patterns with
special-case handling, that doesn't need an extra bit for each one;
I've been thinking we ought to be a lot freer to add more such
special cases. Feels like we should be able to have one bit for
RXf_SPECIALCASE, and store the specifics somewhere else (possibly
just as a custom regop).

Hugo

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About