develooper Front page | perl.perl5.porters | Postings from May 2021

Re: Revisiting trim

Thread Previous | Thread Next
From:
mah.kitteh via perl5-porters
Date:
May 29, 2021 02:51
Subject:
Re: Revisiting trim
Message ID:
O_Nu4S3PiMwm930St2a9mmr4x8mBuxa8wcZOpPP8o4gerSKvESVhingmsZhPTucrx6apYMKKeJ6W9pz7I6wPp16QAfKOmdc1vvPjuBOSXA4=@protonmail.ch
The follow way to "trim" using "split" seems to provide a constant time solution, not dependent on the length of the string. Although I don't know how "split" is implemented, this its constancy is not surprising.

In fact, the filthy way I'm generating strings necessarily overtakes the amount of time to run this very quickly.

# bench.sh
for NUM in $(seq 1 100);
do
  STRING=$(perl -e "printf qq{  %s  }, ' a b ' x $NUM")
  time perl x.pl "$STRING" 2>&1 | grep real
done

# x.pl
my $foo = $ARGV[0];
my $trimmed = (split /^\s*|\s*$/, $foo)[-1];
print qq{'$trimmed'\n}; # <- commenting out provides no benefit timewise


exerpt of output ('real' bounces between 7ms and 16ms, indicating a sensitivity to the mac OS process scheduler itself which is even more indicitave to the efficiency of this solution):


real    0m0.007s
user    0m0.002s
sys     0m0.004s
'a ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba b'

real    0m0.007s
user    0m0.002s
sys     0m0.003s
'a ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba b'

real    0m0.015s
user    0m0.003s
sys     0m0.006s
'a ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba b'

real    0m0.007s
user    0m0.002s
sys     0m0.003s
'a ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba b'

real    0m0.008s
user    0m0.002s
sys     0m0.004s
'a ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba ba b'

real    0m0.008s
user    0m0.002s
sys     0m0.003s

Cheers,
Brett

--
oodler@cpan.org
√https://github.com/oodler577
#pdl #p5p #p7-dev #native @ irc.perl.org

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, May 28, 2021 5:52 PM, Joseph Brenner <doomvox@gmail.com> wrote:

> Some quick-and-dirty benchmarking, trimming 100,000 short strings:
>
> case 1:
> $line =~ s/^\s+//;
> $line =~ s/\s+$//;
>
> real 0m1.427s
>
> ==============
>
> case 2:
> $line =~ s/^\s*(.+?)\s*$/$1/;
>
> real 0m1.853s
>
> ==============
>
> case 3:
> $line =~ s/^\s*|\s*$//g;
>
> real 0m2.864s
>
> ==============
>
> So, case 2 is 30% slower, case 3 is 100% slower.
>
> There's a simple fix that improves case 3 quite a bit:
>
> case 4:
> $line =~ s/^\s+|\s+$//g;
>
> real 0m1.704s
>
> ==============
>
> However: I took it very easy on this case using short lines... it's
> very sensitive to line length (that \g is checking every point in the
> string) and it slows down by a factor of ten with lines that are only
> around 80 chars long.
>
> Anyway, these speed penalties are Not Good, but they're also not
> (usually) a reason to care.
> Granted I was exaggerating calling these hairy and
> unreadable, but I think they're all harder to read.
>
> (For example, with "case 3", my first thought was it was
> broken and wouldn't strip trailing whitespace if it
> had stripped leading whitespace, but then I noticed the /g.
> And further, it's using a * instead of a +, so without the /g
> it never strips trailing space: so there were two things
> I didn't understand.)
>
> The thing you should ask yourself as a perl programmer is
> "what did I think I would gain from doing this in one
> line?".
>
> The key point for the perl5-porters though is that there
> is indeed a need for a built-in trim.



Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About