develooper Front page | perl.perl5.porters | Postings from August 2012

Re: [PATCH] Module::CoreList delta support

Thread Previous | Thread Next
From:
David Golden
Date:
August 5, 2012 17:38
Subject:
Re: [PATCH] Module::CoreList delta support
Message ID:
CAOeq1c_PyXFJPVjFtA8j5uXr70z-r6TYKqHZNx87A8fhcKcnig@mail.gmail.com
On Sun, Aug 5, 2012 at 4:39 PM, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote:
> You have not considered what happens for queries on the by-perl-release
> axis, in which case under your scheme the only solution is to load and
> decode every single line and then do a hash access on it.

I have considered it and I consider it rare enough to dismiss out of
hand.  I suspect it primarily benefits porters.  (Which is not to say
that we're not important and worth optimizing for, but I still think
it's the minority case.

>> Only in the rarest of cases do you need to load the whole file -- most
>> uses are still "corelist Foo", which benefits hugely from bisection.
>
> Err. You are proposing that doing repeated disk seeks will be faster
> than reading 25KB in a single I/O operation and then inflating that to
> 500KB in memory using gzip (which decompresses at near memcpy speed),
> and then scanning that string.
>
> I, uhm… Are you sure you want to have that match?

/me shrugs.  Don't know.  Don't terribly care, really.  For the most
part, I find a design that requires loading everything to throw away
99%+ of it most of the time is ugly, even if Moore's law has saved us
from caring.

I'm certainly not suggesting my idea is the *best* way (for any
definition of best), I was reacting to the "it hurts when I do this"
comment you made and thus my reaction was "so don't do that".

I don't know what anyone really cares about optimizing here.  It
started (mostly) with the desire to minimize size on disk.  Minimizing
memory size and speed are nice, too.  So is minimizing p5p maintenance
hassle.  I'd be happy with an approach that does a decent job across
criteria rather than focusing excessively on any single one.

What I *don't* like about compression (even though I played around
with such approaches) is that it's opaque on disk, which makes
reviewing deltas extremely difficult.  What changed from version X to
Y?  Did module Z actually get updated or not?  Ugh.  Let's unpack the
data (or run the code) and find out... instead of "git diff".  But
it's wizard for minimizing disk size, which is where this whole idea
started.

-- David

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About