develooper Front page | perl.perl5.porters | Postings from April 2021

RFC - "Saving" perl

Thread Next
From:
B. Estrade
Date:
April 12, 2021 05:15
Subject:
RFC - "Saving" perl
Message ID:
5e4525bf-b6f0-2de8-e7f1-317e6047066e@cpanel.net
This is broken up. General idea at the top; ramblings after that. I
appreciate the consideration on this topic that's been afforded and the
patience all have held during this exercise. Everything below is mine and
mine alone. It expresses no one else's opinions. Just to be "transparent".
Additionally I consulted no one while writing it or before sending it.

I. EXECUTIVE SUMMARY

I'll present the idea, then I'll probably write a ton of stuff to provide
background and context, plus a lot of stuff that is irrelevant. Such is my
burden; here it is. Also note, this is not my idea. I just happen to
recognize its importance and potential to be a tremendous catalyst.

IDEA

I should say upfront, I am under the impression we don't have anything
like what I am about to describe. If I am mistaken and we do, then great;
we're half way there.

The perl runtime would greatly benefit from a simple (but not too 
simple) API
or mechanism by which execution contexts may be managed (saved, resumed,
inspected, etc). The execution context represents the overall state [0] of
the program at the time it's saved. Note, the book in which [0] is 
chapter is
a dense read, but highly recommended it all.

Or, conversely, we should seek to assist LeoNerd in his current async/await
work to see a general usable approach is made available as an artifact 
of the
implementation (not on him IMO, on *us*).

In time sharing operating systems (Unix), this is known as "context
switching". When using git, it's the "git stash" interface. See [7,8] for
more reads.

SYNOPSIS

If we can't have more than one execution context at a time; let's fake it in
a useful way by providing a way to save it, resume it, and even manipulate
more than one context at a time (e.g., "merging" - more on this later)

I expand a little below, but to make the executive summary complete; It's my
understanding that "80%" of LeoNerd's effort behind async/away is based
solely on context management. Apologies if this point was misunderstood. 
But I
don't think it was. Nonetheless that only encouraged what I had already
concluded was very important during that "chat".

<!-- you can stop here if you just wanted the general idea -->

II. LONGMESS

The perl "run time" is a single, monolith execution context. When we fork,
we're admitting this. We're also admitting another thing implicitly; that
fork, while very useful, only creates clones of the parent process. And the
are non-communicating. So, one important and meaningful concept is that
"perl" is great at facilitating and managing any number of 
"non-communicating
sequential processes" [5,6].

This has cost untold amounts of time and mental anguish for people who tried
to solve this "problem", were forced to seek alternatives but really wanted
to use perl/Perl, etc. Is it a problem? Not if you admit that a perl process
is a single execution context. It is part of the nature of Perl. This is
where I am.

It wasn't that long ago that single CPUs were normal. The race to get
"faster" was measured in MHz and GHz. People overclocked their single CPUs,
paid way to much for a few MB of RAM, installed FreeBSD or Gentoo so they
could tweak every program they installed, etc. Many remember those days.

It *was* quite a while ago that Unix came on to the scene. Its main selling
point was that it was a "time sharing" operating system. Not to get into 
"why
did Unix succeed where others failed"; I will just point this out. The time
sharing aspect of Unix allowed multiple users to be on the machine at the
same time. Though, generally the people "felt" like they had exclusive 
access
to it.

Time sharing was so popular, this thing called Linux came about. And 
before I
get accused of providing a terrible history lesson, I will end by pointing
out that *most* of use first installed Linux (or a BSD) many years ago on
single CPU machines. Some of us might still do that for sport. The point is
that the "time sharing" approach, implemented.

Getting back to the perl "runtime", what does thinking "perl/Perl is a
uniprocessor operating system" get us? A lot of things, actually.

Operating systems research for a long time was focused on the uniprocessor
model. And there are a lot of things that go into presenting a "time 
sharing"
os experience for human beings. Some of those "things" might one day
considered appropriate for the perl runtime, but one in particular I believe
will help us move ahead right now. That concept of *context switching*. It
was so powerful, in fact, people felt like they were the only ones on the
computer. MORE that that, all of the many processes running on the computer
felt like *they* were the only process on the machine. There I go
anthropomorphizing computer stuff again. Sorry.

BACKGROUND

Not to be dramatic, but I've had a slowly growing epiphany. And today after
chatting with some *very* patient folks on #p5p, the path forward dawned on
me.

1. formalize the notion that the perl "run time" is a single context thread
2. don't fight it
3. embrace it

Most stop digging when they realize they're in a hole. I think our situation
requires us to get better shovels, upgrade to a steam shovel, or even invest
in a few tunneling machines. I suspect we'll bust out on the other side of
the world and be glad we did. It might get hot for a while, but there's 
really
no alternative.

What does embracing it look like? I looks like this. If we've got this
amazingly powerful, albeit uniprocess runtime, look into the past and
consider how those brilliant folks in the past dealt with having just a
single CPU.

How did this come from a conversation with the good folks on #p5p? LeoNerd
was discussing his async/await work. And, me, having been sufficiently
convinced that no amount of magic could present an "real" asynchronous
environment. I mean, "async" is just like threads but way less powerful. 
This
is why "futures" and "promises" are a thing; but still, I persisted. HOW is
this possible? Here's how based on my limited understanding; and btw I think
it is brilliant and was the final piece of the puzzle for me:

0. main execution context is running; all the async stuff is in a "run loop"

1. async/await is called 2. then execution contexts are managed using
suspension/resumption code being CUSTOM written for this purpose

[LeoNerd correct me if I am wrong here, don't mean to mischaracterize your
work]

Then I said something to the effect of, "seems like support for context
switching 'execution' contexts would help you out there". And, at risk of
taking his comment the wrong way; he said about 80% (EIGHTY PERCENT) of the
code is in place to manage the execution context. WTF.

To his great credit, he also said he suggested the same or a similar 
approach
some years ago. I definitely believe that! Furthermore, I believe such an
approach's time has come.

WHY IS THIS IMPORTANT

Let's generalize the implications that having the ability to "context 
switch"
easily

The implementation of async/await can be described as, "enabling distributed
programming semantics in an environment that may allow one and only one
'execution context' to be active at any time".

In this view, it doesn't matter if it is async/wait, "threading" 
(fork/join),
or some other concurrency model that is 'hot'. It boils down to enabling the
support of the SEMANTICS of the concurrency paradigms in inherently
serialized way.

There is a Computer Science concept for doing this "correctly" given the
semantics of the paradigm you are literally faking.

Some people call this concept "linearizability". Others call it "sequential
consistency".

What does "sequential consistency" mean From [1,2]?

     ...the result of any execution is the same as if the
     operations of all the processors were executed in some
     sequential order, and the operations of each individual
     processor appear in this sequence in the order
     specified by its program.

What does "linearizability" from [3.4] mean?

     In concurrent programming, an operation (or set of
     operations) is linearizable if it consists of an
     ordered list of invocation and response events
     (callbacks), that may be extended by adding response
     events such that:

     + The extended list can be re-expressed as a sequential
       history (is serializable).
     + That sequential history is a subset of the original
       unextended list.

     Informally, this means that the unmodified list of
     events is linearizable if and only if its invocations
     were serializable, but some of the responses of the
     serial schedule have yet to return.

(incidentally, the main author from [4] is a very well know software
transactional memory researcher)

For us, this means: if you're going to fake a multi-process environment in
uni-process model; make damn sure it behaves the same as it would IRL.

The burden of ensuring "sequential consistency" is squarely on whomever is
implementing the required support of the semantics of the concurrent
programming model they are adding, in our case, to perl/Perl.

It also follows that if someone wanted to implement, oh idk, a shared memory
programming environment (e.g., threads; or OpenMP's 'fork'/'join' model);
then they would also have to correctly implement this "sequential
consistency". That's a HUGE burden alone - though a necessary one.

The main point: This is IMPORTANT because extending Perl semantics to 
support
interesting programming paradigms (which is one of it's greatest strengths)
currently requires the dual burden of anyone implementing said semantics to:

a. ensure sequential consistency (rightly so)

b. implement their own way of managing execution contexts (the horror!,
srsly this is an awful situation)

It is my conclusion that by marshaling our brains and resources, the CORRECT
"next" thing to provide is a standard way of managing execution contexts,
specifically for the context of enabling any new programming paradigm - 
which
all seem to be multi-process. Without this key feature, we are going to get
no more meaningful semantics ever.

For me, and I hope I've convinced the right people, this is a mortal wound
that we need to fix ASAP.

BABBY'S NEXT STEP

It is very fortunate because I do not *think* providing this is such a great
burden. And far be it for me to suggest *what* to do, but I will propose a
straw-man "plan" of action that is hopefully clear and doable; and it 
provides
a clear high level road map:

0. formalize that the "perl run time" is a uniprocess environment (the OS
analogy may or may not help)

1. provide a minimal, practical, not totally boneheaded way for those 
working
    on semantics for "new" programming model support (async/away/futures,
    fork/join, etc) to context switch execution contexts.

2. judiciously and incrementally add to this general API layer based on 
needs
    of it supporting additional concurrent programming model semantics

3. enjoy the new found path ahead for new features that are consistent with
    the perl/Perl we all know and love

THE FUTURE

I added this section for additional motivation. With a way to easily 
capture,
save, and restore execution contexts there are suddenly a great many
interesting possibilities for perl/Perl. Some relate to real
multi-processing, some relate to new (and consistent!) semantics, some 
relate
to solving some people problems that have been created over the years due to
how *badly* so many people want a path forward (outside of the coolness
factor, the latter is actually what I hope the greatest real benefit is)..

In terms abilities other than save/restore execution contexts can such a
layer provide; well...one that I think is particularly exciting would be the
ability to "merge" execution contexts. This would be required, for example,
if we wished to present software transactional memory type semantics on top
of any simulated "shared memory" we'd use to support actual SMP semantics.
It'd also provide the basis for experimentation with semantics related to
'lock free' data structures. The list gets longer and more exotic the more
one thinks about it.

In addition to this, here are some other things that seem very possible as a
result of taking this first step:

a. once sufficiently encapsulated, this "context management" layer could be
the basis for enabling actual multi-processing - could be the "perl runtime"
may not even need to be are that things are happening concurrently.

b. #a could also lead to some real collaboration among many individuals that
are highly capable and interest in the are of "run times". That's not my 
aim;
I am just saying that it could provide an opportunity for some folks who
maybe we have not seen in a while to work on very interesting things in that
*layer*.

CONCLUSION

I will wrap it up with the following strongly held convictions:

* it will service us well to look to how uni-process time sharing operating
   systems solve things; it may inspire us with new language features or
   solutions to sticky situations in terms of perl

* if we want to enable semantic extensions to Perl that uniformly implement
   multi-process paradigms in a uni-process environment, we absolutely must
   provide standard capabilities for manage context; likely there are other
   capabilities needed (e.g., "run loop" support? idk if that's a thing).

III. APPENDIX A - FAQ

1. You went from pushing "real" SMP to "fake" multi-processing by 
proposing a
way to save and resume context states

Yes, bust. I feel this is the best way forward. And it could fall out of the
work that LeoNerd is doing now for async/await. It's already being done,
let's try and make it reusable; then actually reuse (I would).

2. You sound pretty confident that this will solve all of our problem, what
if it doesn't?

It's NOT a silver bullet. But I do think that it's highly likely not to be
"sufficient" for some semantic extensions to Perl, I do think it's
"necessary"; i.e., we will need to do this no matter what IMO.

3. You speak of this as if its easy.

I don't think this is easy, but I do think that it'd be a substantial force
multiplier for anyone interested in extending Perl into "interesting"
programming model areas. Going back to what I said above. All the "low
hanging fruit" has been picked. Time to put on our adult pants and move
forward.

4. Sounded like you're thinking about this as a "virtual machine" area, are
you?

NO. I can write another 5,000 words on what I observed starting with pugs,
parrot, moarvm, rakudo, and perl 6. If anything I am suggesting the opposite
approach. Provide the minimum functionality one might need to fake
multi-processing in our uni-process model. That doesn't sounds like a vm
layer to me, it sounds like actually supporting semantic extensions for new
and unanticipated things.

REFERENCES

[0] Edseger W. Dijkstra, "A Discipline of Programming", Chapter 2, 
"STATES AND
   THEIR CHARACTERIZATIONS"

[1] https://en.wikipedia.org/wiki/Sequential_consistency

[2] Leslie Lamport, "How to Make a Multiprocessor Computer That 
Correctly Executes
Multiprocess Programs", IEEE Trans. Comput. C-28,9 (Sept. 1979), 690-691.

[3] https://en.wikipedia.org/wiki/Linearizability

[4] Herlihy, Maurice P.; Wing, Jeannette M. (1990). "Linearizability: A 
Correctness
Condition for Concurrent Objects". ACM Transactions on Programming 
Languages and
Systems. 12 (3): 463–492. CiteSeerX 10.1.1.142.5315

[5] http://www.usingcsp.com/cspbook.pdf

[6] https://en.wikipedia.org/wiki/Communicating_sequential_processes

[7] https://en.wikipedia.org/wiki/Coroutine#Implementations_for_C

[8] https://docs.python.org/3/reference/compound_stmts.html#async

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About