develooper Front page | perl.perl5.porters | Postings from March 2021

Re: Let's talk about trim() so more

Thread Previous | Thread Next
Scott Baker
March 31, 2021 15:04
Re: Let's talk about trim() so more
Message ID:
Excellent research Mr. Bullock!

You bring up a very good point Grinnz came across
when we initially started implementing trim(). One of the main reasons
we want it done in core is because it's implemented so many times in
other places, and *often* implemented *incorrectly*. Putting it in core
we can implement it correctly and stop developers from having to
reinvent the wheel.

In your research did you find out if people implementing trim() as a sub
do it in-place or as a return value? That seems to be hotly debated
right now.

- Scott

On 3/31/21 6:01 AM, Ben Bullock wrote:
> On Wed, 31 Mar 2021 at 08:59, <> wrote:
>> Trimming is something that is frequently wanted, but though you
>> think it a no-brainer, people don=E2=80=99t always get it right. I found
>> about 7500 distributions with an "inline trim". Here are some of the
>> ones I found:
>>     s/(^\s+)|(\s+$)//;
>>     s/(^\s+)|\n//gm;
>>     s/(^\s+|\s+$)//g;
>>     s/(^\s+)|(\s+$)//g;
>>     s/(^\s+|\s+$)//os;
>>     s/(^\s+|\s+$)//gs;
>>     s/^\s*//; s/\s+$//;
>> Not all of those work.
> There are any number of "gotcha" failures using regex trim on CPAN and
> even within Perl core modules. Further, at least two core module
> authors have duplicated "trim".
> A search for "trim string" on finds
>                 # trim leading and trailing white spaces
>                 $val =3D~ s/^\s*|\s*$//;
> This substitution will return true even if it matches nothing due to
> the asterisk, and it will fail to remove trailing whitespace if there
> is also leading whitespace due to the lack of a /g flag.
>     $ perl -e 'my $g=3D"   x   ";$g=3D~s/^\s+|\s+$//;print "!$g!\n";'
>     !x   !
> We can find many more examples of the "omitted /g" error on CPAN:
> %2F%5CE%5B%5Eg%5D*%24&qd=3D&qft=3D
> Using * instead of + after \s causes the substitution to always return
> a true value even if nothing changed. This is also fairly common:
> qd=3D&qft=3D
> It says "80 distributions". I looked through all of them but I didn't
> find anywhere where the return value of the substitution was being
> used, perhaps because that bug would have been caught quickly, except
> for here:
> %2F%2F&qft=3D&qd=3DCohortExplorer
> where the programmer seems actually to be using the fact that it
> always returns a true value.
> Furthermore, there are several examples in Perl core modules.
> Mistaken use of the /s flag (make . match \n) to mean /m (make ^ and $
> match new lines) is seen in such modules as Pod::Simple, CPAN::Module,
> Net::SMTP, Pod::Checker, Locale::Maketext, and I18N::LangTags.
> Mistaken use of s/^\s*// for trimming is seen in core modules like
> Win32,, and CPAN::Complete.
> I also found one example of s/^\s+|\s+$// (omitted /g flag means it
> fails to remove the end space from " this ") in the core modules, in
> ExtUtils::CBuilder::Platform::Windows:
>     map {$a=3D$_;$a=3D~s/\t/ /g;$a=3D~s/^\s+|\s+$//;$a}
> Individual core modules which implement their own "trim" function
> include ExtUtils::ParseXS (trim_whitespace) and TAP::Parser (_trim).

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About