develooper Front page | perl.perl5.porters | Postings from July 2008

Re: revive undump / unexec?

Tom Christiansen
July 27, 2008 12:26
Re: revive undump / unexec?
Message ID:
In-Reply-To: Message from Reini Urban <> 
   of "Sun, 20 Jul 2008 13:05:08 +0200." <> 

> Anyone investigated reviving undump for the platforms where the
> supporting code already exists: elf and coff on i386, besides a.out
> and solaris?

Great question.  Pity there's been no answer.

Is the motivation to skip the compilation phase and so ship/distribute in
effect a precompiled binary that doesn't need a zillion auxiliary packages,
or did you have something else in mind?

> For example take it from emacs or some scheme packages, or even parrot.
> These support even more platforms.

Interesting, especially the lattermost.

> Tom Christiansen once added unexec from emacs to perl as I read
> in perl.c.

I've read that, too, but no longer recall having done so.  Maybe it was
back in the 80s, so please don't blame early-onset Alzheimers--yet. :-(

Nonetheless, considering that I once upon a time, long ago and far away,
did have actual personal experience across several different arenas with
this particular sort of thing, I can well believe it of the then-me.

For people who might care to investigate further, I'll point out some
earlier work on related matters, and then some later stuff.  In poking
around, I chanced upon a note by a Python-user lamenting that language's
lack of checkpoint/restart, confirmed by Guido.  So if anybody's looking
for coolness kudos or checkoff-box bonuses... :-)

Those uninterested in undump and its brethren should hit "q" now.  


My first experience with these things was in the mid 1980's during my
undergraduate days at UW-Madison.  I implemented an rfork() function,
where "r" stood not for "restricted" but rather for "remote".  You have
to be able to do an undump/unexec before you can build the remoteness on
it.  I *think* I did this work under Marv Solomon as an undergraduate,
although my later, much more substantial O/S work there as a grad
student was done under Bart Miller, then a recent Berkeley PhD.

Along with Michael Litzkow, Marv published a paper on "Supporting
Checkpointing and Process Migration Outside the UNIX Kernel" for the 1992
Winter USENIX in San Francisco, so it was something he'd spent some time
with.  We had a machine room with a big "rack" of Vax 750s that looked a
lot like a laundry mat.  I kid you not; see "Condor".  We even put paper
photocopies of dryer doors on them. :-)

Litzgow also has an earlier paper, "Remote Unix - Turning Idle
Workstations into Cycle Servers" published at the 1987 Summer Usenix in
Phoenix.  Given how that's the very season I finally escaped from
university, there may be stuff of mine at least referenced there.

Though I find no direct mention of my rfork() work, and my 9-track tape of
my university days has no place to hang, I did find a longish, decade- old
thread discussing the pros and cons of rfork() by one Mr Larry McVoy, whom
some of you may know.  Now, turns out that Larry was a compeer (confrere?)
of mine in those times and places, so he may might have remembered my
earlier work, or Marv's with Condor at the UW.  See what Larry has to say 
here, where he argues that rexec() is preferable to rfork():

Checkpoint/restart was *very much* in the air at the time.  See, for
example, the 1995 paper, "Libckpt: Transparent Checkpointing under UNIX"
from Winter USENIX in New Orleans, or the shorter article, "POSIX.1h
SRASS and POSIX.1m Checkpoint/Restart" in the February 1999 edition of
USENIX's periodical newsletter, _;login:_.  

My other experience, less direct because I was not its implementer, was
under Convex O/S, a 4.3 BSD variant.  The Convex C-series machines were
air-cooled vector machines, one that in later incarnations were also
multiheaded SMP bigboxes.  We did this because we had users whose
long-running, critical calculations *had* to be restartable. 

The implementation here was much better than mine or Condor's, because
it allowed checkpointing not just entire process groups (consider
pipelines), but indeed, all processes and kernel state for an entire
machine.  This was a *big* deal!

Imagine your super-UPS has only 5 minutes; how do you save *everything*?
The only thing we didn't handle was socket connections that weren't on
the same machine.  It was really very slick, even moreso when you
consider its thread-spawning was a *machine-language instruction*
(called fork, and paired with join) that provided asymmetric parallelism
without O/S involvement--save on the join, which was only to gather
special thread registers' acct data for the rusage structure.

Some very large computing centers would have batch daemons managing
submissions spread across *several* Convex machines, each being an SMP
multiheaded monster with shared-via-NFS filesystems.  I've a vague whisp
of memory that checkpoint/restart came into play here, too, but don't
remember any details, and don't really want to dig out the old manuals.

When you control the design of the hardware *and* the compiler *and* 
the operating system *and* the userland utilities, you can produce truly
remarkable works that would be otherwise impossible.

I know Jim Mankovich (jman) has an ACM paper about the fork/join hardware
instructions and unique asymmetric parallelism.  But I don't recall whether
it was jman, Tom Watson, or any of several others who shouldered the bulk
of the checkpoint/restart work for the Convex system.  I believe you'd
killpg with SIGCHKPNT or some such, but details escape me.  Last I knew
of Tom (which is *far* from recent), he was working on (or thinking about,
which isn't all that different) NUMA clouds, I think for HP.  

I don't remember for sure whether it was with Solomon or Watson that we
came to the conclusion that process-migration to a remote/distinct host
(and hence lacking the original memory set) was simply way too expensive
for nearly anything.  You get a lot more buck for your bang just by
initially selecting the *right* host to run your program on, and just leave
it there.  Probably Solomon, though.

Only for those jobs which ran for weeks or months and thus had to be able
to survive machine crashes--even their host-computer's permadeath--did it
make sense to do anything else in that environment.  In extremis, you'd
resurrect the checkpointed job on a different box altogether, and there'd
be *much* rejoicing over time unlost.  Months are months.

But that was in the day of one big box doing one big computation.
That's not the model so often seen now in these latter days, whether 
we're talking about SETI hunters, Mersenne prime prospectors, or 
those "wowsya" feats of fancy-dancy animation that pull in gigabucks 
worldwide at the cinema and through DVD sales.

The big rendering warehouses, such as those used by George Lucas or Peter
Jackson, certainly do distributed rendering.  But IIRC, on node-failure
there, they just rerun the whole thing (well, the subportion whose node
died)--*and* blindly replace (=junk) the poor little Blade that flaked out
on them.  Seems harsh to me and financially suspect, but for them, I guess
this is less expensive in the long run than figuring out what sort of
cosmic-ray event may have wigged out their hardware.

But gosh, talk about a throw-away, consumerist mentality: ouch!

> It would need a little bit of restructuring the undump code though
> which only went the solaris way together with the external undump
> utility, if I remember correctly.

It seems to me that the LSF utilities for cluster-computing might have
something that might prove useful for this.  They, too, need to support
some sort of rfork() or, more probably, a kind of checkpoint/restart on a
same-instruction-set box in the cluster.  But I don't know anything about
code availability or the terms of their licence.

A road possibly more likely to lead to paydirt might be looking into the
checkpoint/restart work done for Linux.  Berkeley Labs did some, but there
have been several others, too.  I think checkpoint/restart may even have
made it into the LSB, but haven't looked it up in the most recent revs.

Good luck!


    "Those who cannot remember the past are condemned to repeat it."
	--George Santayana in "The Life of Reason, Volume 1" [1905] Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About