develooper Front page | perl.perl5.porters | Postings from May 2021

Re: Revisiting trim

Thread Previous | Thread Next
From:
hv
Date:
May 30, 2021 21:40
Subject:
Re: Revisiting trim
Message ID:
202105302109.14UL95P29860@crypt.org
Karl Williamson <public@khwilliamson.com> wrote:
:On 5/29/21 1:37 AM, demerphq wrote:
[...]
:> But this question also illustrates the problem here. The regex engine
:> doesn't know how to go backwards.  [...]
:
:Maybe you and I should have a chat about what can and should be done to 
:improve the matching speed of right-anchored patterns.
:
:I suppose it is theoretically possible to create reverse 
:Perl_re_intuit_start() and S_find_byclass() functions, if one could wrap 
:one's mind around that, though the libc support is limited.  But I could 
:be wrong about the feasibility and it would be more work than anyone 
:would care to undertake.

FWIW, I think it is probably impossible for the general case of /pat\z/,
but for restricted cases (primarily those without captures) it might not
be so hard.

:But there are things that could be done.  It had never occurred to me 
:before that the hop_back functions could be called with large numbers. 
:Backing up in a UTF-8 string could be improved by a factor of 8 by doing 
:per-word operations.  (You load a whole word.  One can isolate and count 
:the continuation bytes in it by some shifting/masking/ etc operations. 
:Everything that isn't a continuation byte marks a character.) 
:Similarly, functions like S_find_next_masked() could have a 
:corresponding reversed version, though slower on UTF-8 than the forward 
:because of the forward bias of UTF-8.

Yes, that sounds good.

Hugo

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About