On Thu, 24 Feb 2022 at 11:09, Darren Duncan <darren@darrenduncan.net> wrote: > On 2022-02-24 1:11 a.m., demerphq wrote: > > Our use of in place modifications allows us to operate on very large > strings > > without incurring huge overheads. Chopping a 4GB buffer does not result > in us > > using 8GB. Trimming a buffer with 4GB of text without inplace > modification WOULD > > result in using 8GB. What technical reason is there for us to pay that > penalty? > > A key general benefit of immutability is you can share > memory/representation > between multiple instances or similar values because you know it isn't > going to > be modified out from under you. > > For data types resembling a long sequence, such as large text blocks, many > operations could be represented symbolically behind the scenes. > > For example, the result of trim could just be a tiny structure that says, > here > is a string whose value is the substring of that other string between > these 2 > index positions, which in practice are likely near each end of the > original. > > Then your 2 slightly different 4GB strings only occupy the 4GB of memory > once. > > Obviously such an implementation means that other parts of the system are > more > complicated, each other string processing operation needs to handle the > symbolic > representation as well as the other one, and possibly operate differently > depending on what it has to do. > > In some cases the symbolic version just means the trim results in a lazy > copy > rather than an eager one, but other times it never has to be done at all. > > So there are trade-offs that can reduce memory use and increase > performance in > exchange for some greater complexity of some logic but also reduced > complexity > in other logic. > > Note that I've been thinking about these matters a lot as I'm in the > process of > implementing a language where practically everything is immutable types. > You just described COW and some other features in core (some whose name escapes me right now, LVALUE strings maybe? I forget.) They aren't always the performance win that simple in place modification is, and often /still/ result in memory duplication when it wouldn't strictly be necessary. Devel::Peek is your friend. When you see a string (pv in perl internals parlance) marked as COW it means it is shared with up to 255 other SV's, you can see the COW_REFCOUNT as well. I spent a lot of time working on routines like trim and friends for generating Bookings website. In place modification can be a *huge* win. I encourage you to take the time to learn more about how the internals actually work. I see you are doing interesting things and asking interesting questions, and have interesting ideas, and I think you would find it a source of inspiration, and at the same time maybe your ideas might be a source of inspiration for new ideas in the core. cheers, YvesThread Previous | Thread Next