develooper Front page | | Postings from December 2012

How To Build A Perl Package Database

Thread Next
Michael G Schwern
December 15, 2012 22:59
How To Build A Perl Package Database
Message ID:
We have a lot of serious problems because we lack a database of installed
distributions, releases and files.  There are serious problems with
implementing one given A) the limitations of the standard Perl install and B)
wedging it into existing systems.  But I think I have a solution.  Its similar
to how meta data was slipped into the ecosystem without requiring authors to
rewrite their releases or install a bunch of extra modules.  It just happens
as part of the normal CPAN module upgrade process.

I've been thinking that a minimal package database could be created by putting
some hooks into ExtUtils::Install::install(), which every Perl build system
ultimately uses, to record what gets installed.  That way when
ExtUtils::Install is upgraded, the user gets a build database without
upgrading everything else.

This would be a fairly straight forward process at install time...

1) Copy everything to a temp directory
2) Record everything in that temp directory
3) Copy everything from temp into the real location

You could probably optimize this by skipping the copy to temp and just have
install() record stuff as it goes by, but this is the dumb, simple, robust way
to do it.

Storage is a problem.  The only reliable "database" Perl ships with is DBM, an
on disk hash, so we can't get too fancy.  It might take several DBM files, but
this is enough to record information and do simple queries.  What are those

* What version of the database is this?
* What distributions are installed?
* What release of a distribution is installed?
* What files are in that release?
* What version is that release?
* What location was a release installed into? (core, vendor, site, custom)
* What are the checksums of those files?

And the basic operations we need to support.

* Add a release (ie. install).
* Delete a release (and its files).
* Delete an older version of a release (as part of install).
* Delete an older version of a release, only if its in the same release
  location.  This is so CPAN installs don't delete vendor installed modules.
* Verify the files of a release.
* List distributions/releases installed.

It would also store the MYMETA data which gives us a lot of information (such
as dependencies) for free.

This is all totally doable, and efficient enough, with a small pile of DBM
files and Storable.  Where to put the database is a bit more complicated, see
the list of open problems below.

There's lots and lots and lots of additional information which could be stored
and queries and operations to allow, but if we can get the basics working
it'll allow a heap of new solutions.  And I think this is a SMOP.

Future possibilities include...

* Auto-upgrade to SQLite if ExtUtils::Install::DB::SQLite is installed.

If a special module is installed we can offer SQLite support (or whatever) for
a more advanced database.  At install time it would copy the existing DBM
system into its own database.

In general, more functionality can be added as more optional (or bundled)
dependencies are available to the system.  Through it all the basic DBM
database would continue to be redundantly maintained to provide a fallback
should those optional modules break or go away.

* Extra hooks into the install system.

ExtUtils::Install is sort of a black box.  If it started to do more than just
copy files it would need a more interesting API.  Rather than trying to cram
more options into install() it would be worthwhile to write a new API.  Build
systems can check for the existence of the new API and use that if available
and do more interesting things with the database.  This would be necessary to
support uninstall.

Problems include...

* Anything installed before the new ExtUtils::Install is lost.

Just have to live with that.  It will slowly go away as the new
ExtUtils::Install gets into core, Perl is released and vendor Perls update
their Perl or core modules.  It'll take time, but we're in the long run here.

* Anything installed outside the normal blib process is lost.

Initially, this is acceptable loses.

Ideally the install process would be expanded to better deal with things which
are not Perl libraries or programs.  Build systems which have their own
methods of installing these things could add them directly using the install
database API.  A lot of hand waving here.

* Upgrading the database.

I'd like to put some thought into how things are laid out initially to avoid a
lot of major revisions, and thought into what information should be recorded
so its available later, but eventually we're going to want to change the
"schema", such as it is with DBM files.

I figure this can happen as part of upgrading ExtUtils::Install.  It checks
what version of the database you have and performs the necessary transforms to
bring it up to the current version.  We know how to do this, just have to keep
it in mind and remember to implement it.

* Where to put the database?  What about non-standard install locations?

$Config{archlib} would seem the obvious location, but it presents a
permissions problem.  If a non-root user installs into their home
directory, you don't want them needing root to write to the installation
database.  There's several ways to deal with this.

One is to simply not record non-standard install locations, but this loses
data and punishes all those local::lib users out there.

Another is to have a separate install database for non-standard install
locations.  This makes sense to me, but it brings in the sticky problem
of having to merge install databases.  Sticky, but still a SMOP.  Once you
have to implement merging anyway, it now makes sense to have an install
database for each install location.  One for core.  One for vendor.  One for
perl.  And one for each custom location.  This has a lot of advantages to
better fit how Perl layers module installs.

    * allows separation of permissions
    * allows queries of what's installed based on what's in @INC

That second one is important.  When a normal user queries the database, they
want to get what's installed in the standard library location.  When a
local::lib user queries the database, they want to get what's installed in the
standard library locations AND their own local lib.

In summary...

Not perfect, but gets us off the ground.  Its not a great database, but it
does the important job of recording the critical install-time data for later
use.  Its implementable within the current system.  It doesn't require a bunch
of dependencies, just one upgrade.  It works with most existing module
releases.  It solves a major design problem with the Perl module system.

I think it's a Simple(?!) Matter Of Programming in ExtUtils::Install to get it
off the ground.  IMO the most important bit of coordination is putting some
thought into what the basic database should look like so we don't have to
worry about complicated upgrades later.


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About