develooper Front page | perl.perl5.porters | Postings from May 2021

Re: Revisiting trim

Thread Previous | Thread Next
From:
demerphq
Date:
May 29, 2021 07:47
Subject:
Re: Revisiting trim
Message ID:
CANgJU+VwTg1_cB1991WE4pgDKbAp-yfELR269wj6XmAgAy4nUg@mail.gmail.com
On Fri, 28 May 2021 at 22:31, David Nicol <davidnicol@gmail.com> wrote:
>
>
>
> On Fri, May 28, 2021 at 11:25 AM Joseph Brenner <doomvox@gmail.com> wrote:
>>
>> André Warnier (tomcat/perl) <aw@ice-sa.com> wrote:
>>
>> > $stripped_line =~ s/^\s+//; $stripped_line =~ /\s+$//; # or only one of those, depends
>>
>> > Is /that/ the worst possible way ? or if not *the* worst, was there a better way all along ? (*)
>>
>> That's a very reasonable way of doing it which may very well be the
>> best way (though you dropped an "s" on the second "s///").
>>
>> They were probably referring to a tendency of many programmers to
>> obsess with trimming the left and right with a single s/// operation,
>> which will result in a hairy, unreadable solution that won't peform
>> any better than just doing it in two steps.
>
>
> Is this really slowerr? Is this really hairier and less readable than the two step approach?
>
>      $reference_identifier =~ s/^\s*(.+?)\s*$/$1/;  # how I usually full-trim a reference identifier

This avoids the killer aspect of s/^\s+|\s+$/, but it still scales
proportional to the length of the string and the number of space
non-space sequences in the string. The overhead will be quite a bit
higher, and I assume you want to make the . match newlines? Consider
this wont work the same as other examples on a string like " foo\nbar
".

The reality is that the regex engine is crappy way to do this
particular task. To do it right you want to start from the right hand
side and search left, such that your performance is proportional to
the number of characters being removed. The regex engine no matter how
you slice it is going to go left to right, and is thus at best going
to be proportional to the length of the string overall.

TBH, I would not be surprised if:

chop($str) while $str=~/\s\z/;

or

1 while $str=~s/\s\z//;

is actually one of the fastest ways to do this with a regex.  I
believe in these cases the regex engine does actually use the
utf8-skip-backwards macros (eg it knows how to find the position that
is K characters before the end of the string to see if they match a
space character, it does not know how to scan from the right to find
the maximal set of space characters).

So yes, frankly as someone intimate with the regex engine I would say
that this is a task that people should NOT use the regex engine for at
all. Unfortunately to do this really right as a function you need to
do it in C.

cheers,
Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About