develooper Front page | perl.perl5.porters | Postings from March 2021

Re: Let's talk about trim() so more

Thread Previous | Thread Next
From:
Ben Bullock
Date:
March 31, 2021 13:02
Subject:
Re: Let's talk about trim() so more
Message ID:
CAN5Y6m94H6TUCrHVR4-Ek72zq5tQPACxRSzVqKWy0DA5LhznNg@mail.gmail.com
On Wed, 31 Mar 2021 at 08:59, <neilb@neilb.org> wrote:

> Trimming is something that is frequently wanted, but though you
> think it a no-brainer, people don=E2=80=99t always get it right. I found
> about 7500 distributions with an "inline trim". Here are some of the
> ones I found:
>
>     s/(^\s+)|(\s+$)//;
>     s/(^\s+)|\n//gm;
>     s/(^\s+|\s+$)//g;
>     s/(^\s+)|(\s+$)//g;
>     s/(^\s+|\s+$)//os;
>     s/(^\s+|\s+$)//gs;
>     s/^\s*//; s/\s+$//;
>
> Not all of those work.

There are any number of "gotcha" failures using regex trim on CPAN and
even within Perl core modules. Further, at least two core module
authors have duplicated "trim".

A search for "trim string" on metacpan.org finds
https://metacpan.org/pod/POOF:

                # trim leading and trailing white spaces
                $val =3D~ s/^\s*|\s*$//;

This substitution will return true even if it matches nothing due to
the asterisk, and it will fail to remove trailing whitespace if there
is also leading whitespace due to the lack of a /g flag.

    $ perl -e 'my $g=3D"   x   ";$g=3D~s/^\s+|\s+$//;print "!$g!\n";'
    !x   !

We can find many more examples of the "omitted /g" error on CPAN:

    https://grep.metacpan.org/search?q=3D%5CQs%2F%5E%5Cs%2B%7C%5Cs%2B%24%2F=
%2F%5CE%5B%5Eg%5D*%24&qd=3D&qft=3D

Using * instead of + after \s causes the substitution to always return
a true value even if nothing changed. This is also fairly common:

    https://grep.metacpan.org/search?q=3D%5CQs%2F%5E%5Cs*%7C%5Cs*%24%2F%2F&=
qd=3D&qft=3D

It says "80 distributions". I looked through all of them but I didn't
find anywhere where the return value of the substitution was being
used, perhaps because that bug would have been caught quickly, except
for here:

    https://grep.metacpan.org/search?qci=3D&q=3D%5CQs%2F%5E%5Cs*%7C%5Cs*%24=
%2F%2F&qft=3D&qd=3DCohortExplorer

where the programmer seems actually to be using the fact that it
always returns a true value.

Furthermore, there are several examples in Perl core modules.

Mistaken use of the /s flag (make . match \n) to mean /m (make ^ and $
match new lines) is seen in such modules as Pod::Simple, CPAN::Module,
Net::SMTP, Pod::Checker, Locale::Maketext, and I18N::LangTags.

Mistaken use of s/^\s*// for trimming is seen in core modules like
Win32, bigint.pm, and CPAN::Complete.

I also found one example of s/^\s+|\s+$// (omitted /g flag means it
fails to remove the end space from " this ") in the core modules, in
ExtUtils::CBuilder::Platform::Windows:

    map {$a=3D$_;$a=3D~s/\t/ /g;$a=3D~s/^\s+|\s+$//;$a}

Individual core modules which implement their own "trim" function
include ExtUtils::ParseXS (trim_whitespace) and TAP::Parser (_trim).

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About