develooper Front page | perl.perl5.porters | Postings from May 2021

Re: Revisiting trim

Thread Previous | Thread Next
From:
=?UTF-8?Q?Andr=c3=a9_Warnier_=28tomcat/perl=29?=
Date:
May 28, 2021 10:01
Subject:
Re: Revisiting trim
Message ID:
5fc1431b-5129-ad86-8115-dac42ed04c7b@ice-sa.com
Hi.
As a long-time perl *applications* programmer, I'd like to contribute a couple of things 
and ask a question.

1) maybe 50% of the usage of perl I've had over the last 30 years (and probably 95% of the 
CPU time used with it over that same time) has consisted of processing text (historically 
Terabytes of it, and still Gigabytes of it every day) in more or less complex ways, 
something which perl has always been particularly good at. "Good" being understood as "you 
can do anything with it" and "fast".

2) if there would be a trim() (or trimmed()) function directly in the base language, it 
would be welcome, not only for its functionality itself, but as a way to avoid those 
ever-recurring nagging comments from non-perl people about how "unnecessarily 
complicated/clumsy" this looks like in perl, as comnpared to all these "more modern" 
languages where it is built-in. (So, see this at least in part as a little drop in the 
general bucket of avoiding things which could discourage new potential perl aficionados).

3) many many times when processing textual data, it is convenient and/or necessary to 
strip *trailing* spaces, /without/ stripping *leading* spaces.  Trailing spaces are 
generally not significant and mostly use up disk/memory space unnecessarily.
But leading spaces often fulfill some need for alignment or syntax, and should not always 
be stripped. Thus, if a single trimmed() function was provided, which always trims both 
sides, it would in my view be insufficient, make its usage quite conditional, and even 
sometimes make the deciphering of code (written by someone else) more difficult.
(Like : did they *know* that it trims both sides ? or was that a typo ?). And it would 
still leave the "trim only trailing spaces" functionality to be expressed differently, 
which sounds a bit awkward, even if quite fits the TIMTOWTDI basic perl philosophy.
In other words, I would strongly favor either 3 functions (trimmed, rtrimmed, ltrimmed) or 
trimmed($subject{,"L(eft)"|"B(oth)"|"R(ight)"}), with the default being Both.
(which kind of suggests 1|0|-1 instead as 2d optional argument, a bit like substr() and 
co. where "-1" tends to mean "start from the end backwards", no ?)
(And maybe ltrimmed and rtrimmed can just be internal "aliases" to trimmed)

4) due to the expectations of vintage perl programmers in what regards perl's 
text-processing prouesse (see above), *if* such function(s) were to be provided, one would 
expect it/them to be at least as fast as the best ("unnecessarily complicated/clumsy 
looking") regex achieving the same thing.

And finally, the question : several times in this discussion I have read that, left to 
their own devices currently (meaning with regexp), naive perl programmers do it "in the 
worst way possible".
Now which way is that ?
I admit that for 30+ years, I have been doing this without much thinking about it (once I 
got over my initial wonder 30 years ago at there not being a trim() function) :

my $line = <>; # e.g.
my $stripped_line = $line; # keep the original as is, work on a copy
$stripped_line =~ s/^\s+//; $stripped_line =~ /\s+$//; # or only one of those, depends

Is /that/ the worst possible way ? or if not *the* worst, was there a better way all along 
? (*)

(I should probably add that in 30 years, I heve probably not written a single perl program 
where some form of the above trimming did not happen).

(*) if yes, knowing this from the beginning would probably have helped avoiding the 
current climate crisis

On 28.05.2021 09:26, demerphq wrote:
> On Thu, 27 May 2021 at 22:17, Paul "LeoNerd" Evans
> <leonerd@leonerd.org.uk> wrote:
>>
>> On Thu, 27 May 2021 21:13:35 +0200
>> demerphq <demerphq@gmail.com> wrote:
>>
>>> Having said that, making the function return a result and not do
>>> inplace edit is a massive speed penalty and will likely mean that
>>> those using custom xs already to do this (my workplace) won't
>>> migrate. At least for us the point is to do it quickly, not to do it
>>> in a more self explanatory way.
>>
>> The implementation already detects if target SV == source SV, and edits
>> in-place if that is the case.
>>
>>    $str = trim $str;
>>
>> will be an inplace edit.
>>
>> Don't conflate "the user must write `trim $str` as a mutating keyword"
>> with "the implementation will mutate an existing SV inplace".
> 
> Ah, so that would be this implementation is hairier than it would need
> to be if the argument was modified in place without this type of
> detection, it also explains one of your other comments that didnt make
> sense to me.
> 
> Thanks,
> 
> Yves
> 
> 


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About