Front page | perl.perl5.porters |
Postings from March 2021
Re: Let's talk about trim() so more
Thread Previous
|
Thread Next
From:
Philip R Brenan
Date:
March 31, 2021 15:22
Subject:
Re: Let's talk about trim() so more
Message ID:
CALhwFR=axGTMvU74odp679MFb6M9KXqdg+ANPbgafWhKDW2kiw@mail.gmail.com
The issue is not whether there should be a trim function or not: it is
whether trim should be in the core or in CPAN. Surely there are lots of
other interesting string functions that are not currently in the core but
are in CPAN - where does one draw the line? Why does trim() get special
status and not other equally interesting functions? Can we be sure that
nothing will break if trim is in the core?
On Wed, Mar 31, 2021 at 4:04 PM Scott Baker <scott@perturb.org> wrote:
> Excellent research Mr. Bullock!
>
> You bring up a very good point Grinnz came across
> <https://www.reddit.com/r/perl/comments/hf3jlx/announcing_perl_7/fvwp1zt/?utm_source=reddit&utm_medium=web2x&context=3>
> when we initially started implementing trim(). One of the main reasons we
> want it done in core is because it's implemented so many times in other
> places, and *often* implemented *incorrectly*. Putting it in core we can
> implement it correctly and stop developers from having to reinvent the
> wheel.
>
> In your research did you find out if people implementing trim() as a sub
> do it in-place or as a return value? That seems to be hotly debated right
> now.
>
> - Scott
>
> On 3/31/21 6:01 AM, Ben Bullock wrote:
>
> On Wed, 31 Mar 2021 at 08:59, <neilb@neilb.org> <neilb@neilb.org> wrote:
>
>
> Trimming is something that is frequently wanted, but though you
> think it a no-brainer, people don=E2=80=99t always get it right. I found
> about 7500 distributions with an "inline trim". Here are some of the
> ones I found:
>
> s/(^\s+)|(\s+$)//;
> s/(^\s+)|\n//gm;
> s/(^\s+|\s+$)//g;
> s/(^\s+)|(\s+$)//g;
> s/(^\s+|\s+$)//os;
> s/(^\s+|\s+$)//gs;
> s/^\s*//; s/\s+$//;
>
> Not all of those work.
>
> There are any number of "gotcha" failures using regex trim on CPAN and
> even within Perl core modules. Further, at least two core module
> authors have duplicated "trim".
>
> A search for "trim string" on metacpan.org findshttps://metacpan.org/pod/POOF:
>
> # trim leading and trailing white spaces
> $val =3D~ s/^\s*|\s*$//;
>
> This substitution will return true even if it matches nothing due to
> the asterisk, and it will fail to remove trailing whitespace if there
> is also leading whitespace due to the lack of a /g flag.
>
> $ perl -e 'my $g=3D" x ";$g=3D~s/^\s+|\s+$//;print "!$g!\n";'
> !x !
>
> We can find many more examples of the "omitted /g" error on CPAN:
>
> https://grep.metacpan.org/search?q=3D%5CQs%2F%5E%5Cs%2B%7C%5Cs%2B%24%2F=
> %2F%5CE%5B%5Eg%5D*%24&qd=3D&qft=3D
>
> Using * instead of + after \s causes the substitution to always return
> a true value even if nothing changed. This is also fairly common:
>
> https://grep.metacpan.org/search?q=3D%5CQs%2F%5E%5Cs*%7C%5Cs*%24%2F%2F&=
> qd=3D&qft=3D
>
> It says "80 distributions". I looked through all of them but I didn't
> find anywhere where the return value of the substitution was being
> used, perhaps because that bug would have been caught quickly, except
> for here:
>
> https://grep.metacpan.org/search?qci=3D&q=3D%5CQs%2F%5E%5Cs*%7C%5Cs*%24=
> %2F%2F&qft=3D&qd=3DCohortExplorer
>
> where the programmer seems actually to be using the fact that it
> always returns a true value.
>
> Furthermore, there are several examples in Perl core modules.
>
> Mistaken use of the /s flag (make . match \n) to mean /m (make ^ and $
> match new lines) is seen in such modules as Pod::Simple, CPAN::Module,
> Net::SMTP, Pod::Checker, Locale::Maketext, and I18N::LangTags.
>
> Mistaken use of s/^\s*// for trimming is seen in core modules like
> Win32, bigint.pm, and CPAN::Complete.
>
> I also found one example of s/^\s+|\s+$// (omitted /g flag means it
> fails to remove the end space from " this ") in the core modules, in
> ExtUtils::CBuilder::Platform::Windows:
>
> map {$a=3D$_;$a=3D~s/\t/ /g;$a=3D~s/^\s+|\s+$//;$a}
>
> Individual core modules which implement their own "trim" function
> include ExtUtils::ParseXS (trim_whitespace) and TAP::Parser (_trim).
>
>
>
>
--
Thanks,
Phil <https://metacpan.org/author/PRBRENAN>
Philip R Brenan <https://metacpan.org/author/PRBRENAN>
Thread Previous
|
Thread Next