Front page | perl.perl5.porters |
Postings from May 2021
Re: Revisiting trim
May 27, 2021 19:14
Re: Revisiting trim
Message ID: CANgJU+UA1ae+2vWTirjG_wy4goCAxccaQbZEYUfM=ekrpwNJZg@mail.gmail.com
On Thu, 27 May 2021, 19:10 mah.kitteh via perl5-porters, <
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Thursday, May 27, 2021 11:59 AM, Tomasz Konojacki <email@example.com> wrote:
> > On Thu, 27 May 2021 16:44:42 +0000
> > "mah.kitteh via perl5-porters" firstname.lastname@example.org wrote:
> > > This is quite presumptuous. There has been no conversation on where to
> place this. It's very concerning to me that there has also been very little
> discussion about "where" to place this "single" (yeah right) core feature.
> At this point, and mainly due to the pressure and rush being applied to
> this, my general concern as I said last night is not necessarily "trim" as
> the POC is currently implemented; but what comes after "trim" and how it's
> handed - string related or not. So what's the rush? No rush exists other
> than the proof of concept work facing potential bit rot. That's not really
> perl's problem.
> > There was a lot of conversation. There are literallyhundreds of posts
> > about trim on p5p and github. The discussion has been going for almost a
> > year now.
> > https://github.com/Perl/perl5/issues/17952
> > https://github.com/Perl/perl5/pull/17999
> > In chronological order:
> I am aware of those, but I appreciate you taking the time to provide the
> What I can't seem to find is the conversation on why it needs to be
> implemented at such a low level. If I understood this particular piece with
> some clarity then I'd be happy to never post about "trim" again.
If you mean "why does this warrant C level implementation" then there are a
couple of answers, the simplest one being that the particular type of regex
engine we use doesn't deal with this type of pattern well. A more complex
version would be it is not a DFA and does not know how to match utf8
backwards and it is non trivial to teach it to do so. And people tend to
write the worst possible regexen to do it anyway. The end result is that
trimming strings can be a surprisingly expensive task if not done artfully,
and the code to do it is pretty cryptic so having a function really helps
performance and code clarity.
Having said that, making the function return a result and not do inplace
edit is a massive speed penalty and will likely mean that those using
custom xs already to do this (my workplace) won't migrate. At least for us
the point is to do it quickly, not to do it in a more self explanatory way.
Anyway, I just wanted to point out that doing trim properly in perl with
its bifocal strings and taking account of utf8 and unicode whitespace rules
is not quite as trivial as it might sound.