develooper Front page | perl.perl5.porters | Postings from February 2022

Re: trim vs trimmed revisited

Thread Previous | Thread Next
February 24, 2022 10:28
Re: trim vs trimmed revisited
Message ID:
On Thu, 24 Feb 2022 at 11:09, Darren Duncan <> wrote:

> On 2022-02-24 1:11 a.m., demerphq wrote:
> > Our use of in place modifications allows us to operate on very large
> strings
> > without incurring huge overheads. Chopping a 4GB buffer does not result
> in us
> > using 8GB. Trimming a buffer with 4GB of text without inplace
> modification WOULD
> > result in using 8GB.  What technical reason is there for us to pay that
> penalty?
> A key general benefit of immutability is you can share
> memory/representation
> between multiple instances or similar values because you know it isn't
> going to
> be modified out from under you.
> For data types resembling a long sequence, such as large text blocks, many
> operations could be represented symbolically behind the scenes.
> For example, the result of trim could just be a tiny structure that says,
> here
> is a string whose value is the substring of that other string between
> these 2
> index positions, which in practice are likely near each end of the
> original.
> Then your 2 slightly different 4GB strings only occupy the 4GB of memory
> once.
> Obviously such an implementation means that other parts of the system are
> more
> complicated, each other string processing operation needs to handle the
> symbolic
> representation as well as the other one, and possibly operate differently
> depending on what it has to do.
> In some cases the symbolic version just means the trim results in a lazy
> copy
> rather than an eager one, but other times it never has to be done at all.
> So there are trade-offs that can reduce memory use and increase
> performance in
> exchange for some greater complexity of some logic but also reduced
> complexity
> in other logic.
> Note that I've been thinking about these matters a lot as I'm in the
> process of
> implementing a language where practically everything is immutable types.

You just described COW and some other features in core (some whose name
escapes me right now, LVALUE strings maybe? I forget.) They aren't always
the performance win that simple in place modification is, and often /still/
result in memory duplication when it wouldn't strictly be necessary.
Devel::Peek is your friend. When you see a string (pv in perl internals
parlance) marked as COW it means it is shared with up to 255 other SV's,
you can see the COW_REFCOUNT as well.

I spent a lot of time working on routines like trim and friends for
generating Bookings website. In place modification can be a *huge* win.

I encourage you to take the time to learn more about how the internals
actually work. I see you are doing interesting things and asking
interesting questions, and have interesting ideas, and I think you would
find it a source of inspiration, and at the same time maybe your ideas
might be a source of inspiration for new ideas in the core.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About