On Sun, Aug 5, 2012 at 4:39 PM, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote: > You have not considered what happens for queries on the by-perl-release > axis, in which case under your scheme the only solution is to load and > decode every single line and then do a hash access on it. I have considered it and I consider it rare enough to dismiss out of hand. I suspect it primarily benefits porters. (Which is not to say that we're not important and worth optimizing for, but I still think it's the minority case. >> Only in the rarest of cases do you need to load the whole file -- most >> uses are still "corelist Foo", which benefits hugely from bisection. > > Err. You are proposing that doing repeated disk seeks will be faster > than reading 25KB in a single I/O operation and then inflating that to > 500KB in memory using gzip (which decompresses at near memcpy speed), > and then scanning that string. > > I, uhm… Are you sure you want to have that match? /me shrugs. Don't know. Don't terribly care, really. For the most part, I find a design that requires loading everything to throw away 99%+ of it most of the time is ugly, even if Moore's law has saved us from caring. I'm certainly not suggesting my idea is the *best* way (for any definition of best), I was reacting to the "it hurts when I do this" comment you made and thus my reaction was "so don't do that". I don't know what anyone really cares about optimizing here. It started (mostly) with the desire to minimize size on disk. Minimizing memory size and speed are nice, too. So is minimizing p5p maintenance hassle. I'd be happy with an approach that does a decent job across criteria rather than focusing excessively on any single one. What I *don't* like about compression (even though I played around with such approaches) is that it's opaque on disk, which makes reviewing deltas extremely difficult. What changed from version X to Y? Did module Z actually get updated or not? Ugh. Let's unpack the data (or run the code) and find out... instead of "git diff". But it's wizard for minimizing disk size, which is where this whole idea started. -- DavidThread Previous | Thread Next