develooper Front page | perl.perl5.porters | Postings from March 2021

Re: Let's talk about trim() so more

Thread Previous | Thread Next
From:
Scott Baker
Date:
March 31, 2021 15:04
Subject:
Re: Let's talk about trim() so more
Message ID:
b4e34adf-188d-f4ef-a0e0-e9cc038940d8@perturb.org
Excellent research Mr. Bullock!

You bring up a very good point Grinnz came across
<https://www.reddit.com/r/perl/comments/hf3jlx/announcing_perl_7/fvwp1zt/?utm_source=reddit&utm_medium=web2x&context=3>
when we initially started implementing trim(). One of the main reasons
we want it done in core is because it's implemented so many times in
other places, and *often* implemented *incorrectly*. Putting it in core
we can implement it correctly and stop developers from having to
reinvent the wheel.

In your research did you find out if people implementing trim() as a sub
do it in-place or as a return value? That seems to be hotly debated
right now.

- Scott

On 3/31/21 6:01 AM, Ben Bullock wrote:
> On Wed, 31 Mar 2021 at 08:59, <neilb@neilb.org> wrote:
>
>> Trimming is something that is frequently wanted, but though you
>> think it a no-brainer, people don=E2=80=99t always get it right. I found
>> about 7500 distributions with an "inline trim". Here are some of the
>> ones I found:
>>
>>     s/(^\s+)|(\s+$)//;
>>     s/(^\s+)|\n//gm;
>>     s/(^\s+|\s+$)//g;
>>     s/(^\s+)|(\s+$)//g;
>>     s/(^\s+|\s+$)//os;
>>     s/(^\s+|\s+$)//gs;
>>     s/^\s*//; s/\s+$//;
>>
>> Not all of those work.
> There are any number of "gotcha" failures using regex trim on CPAN and
> even within Perl core modules. Further, at least two core module
> authors have duplicated "trim".
>
> A search for "trim string" on metacpan.org finds
> https://metacpan.org/pod/POOF:
>
>                 # trim leading and trailing white spaces
>                 $val =3D~ s/^\s*|\s*$//;
>
> This substitution will return true even if it matches nothing due to
> the asterisk, and it will fail to remove trailing whitespace if there
> is also leading whitespace due to the lack of a /g flag.
>
>     $ perl -e 'my $g=3D"   x   ";$g=3D~s/^\s+|\s+$//;print "!$g!\n";'
>     !x   !
>
> We can find many more examples of the "omitted /g" error on CPAN:
>
>     https://grep.metacpan.org/search?q=3D%5CQs%2F%5E%5Cs%2B%7C%5Cs%2B%24%2F=
> %2F%5CE%5B%5Eg%5D*%24&qd=3D&qft=3D
>
> Using * instead of + after \s causes the substitution to always return
> a true value even if nothing changed. This is also fairly common:
>
>     https://grep.metacpan.org/search?q=3D%5CQs%2F%5E%5Cs*%7C%5Cs*%24%2F%2F&=
> qd=3D&qft=3D
>
> It says "80 distributions". I looked through all of them but I didn't
> find anywhere where the return value of the substitution was being
> used, perhaps because that bug would have been caught quickly, except
> for here:
>
>     https://grep.metacpan.org/search?qci=3D&q=3D%5CQs%2F%5E%5Cs*%7C%5Cs*%24=
> %2F%2F&qft=3D&qd=3DCohortExplorer
>
> where the programmer seems actually to be using the fact that it
> always returns a true value.
>
> Furthermore, there are several examples in Perl core modules.
>
> Mistaken use of the /s flag (make . match \n) to mean /m (make ^ and $
> match new lines) is seen in such modules as Pod::Simple, CPAN::Module,
> Net::SMTP, Pod::Checker, Locale::Maketext, and I18N::LangTags.
>
> Mistaken use of s/^\s*// for trimming is seen in core modules like
> Win32, bigint.pm, and CPAN::Complete.
>
> I also found one example of s/^\s+|\s+$// (omitted /g flag means it
> fails to remove the end space from " this ") in the core modules, in
> ExtUtils::CBuilder::Platform::Windows:
>
>     map {$a=3D$_;$a=3D~s/\t/ /g;$a=3D~s/^\s+|\s+$//;$a}
>
> Individual core modules which implement their own "trim" function
> include ExtUtils::ParseXS (trim_whitespace) and TAP::Parser (_trim).
>


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About