develooper Front page | perl.perl6.language | Postings from January 2004

Start of thread proposal

Thread Next
Dan Sugalski
January 19, 2004 12:12
Start of thread proposal
Message ID:
I've not gotten into the technical bits yet. That's next, but rip 
this apart first.


So we can all talk about things the same way, the following
definitons apply. Some of these are drawn from the POSIX thread spec,
and as such we should have a translation section at the end.

=over 4

=item THREAD

An OS level thread. If that makes no sense, neither will any of the
rest of this, in which case I recommend picking up "Programming with
POSIX Threads" by Dave Butenhof, and coming back when you have.

=item MUTEX

This is a low level, under the hood, not exposed to users, thing that
can be locked. They're non-recursive, non-read/write, exclusive
things. When a thread gets a mutex, any other attempt to get that
mutex will block until the owning thread releases the mutex. The
platform-native lock construct will be used for this.

=item LOCK

This is an exposed-to-HLL-code thing that can be locked. Only PMCs can
be locked, and the lock may or may not be recursive or read/write.


The "sleep until something pings me" construct. Useful for queue
construction. Not conceptually associated with a MUTEX. (POSIX
threads require this, so we're going to hide it there behind macros
and/or functions)


A HLL version of a condition variable.


Those bits of the Parrot_Interp structure that are absolutely required
to be thread-specific. This includes the current register sets and
stack pointers, as well as security context information. Basically if
a continuation captures it, it's the interpreter.


Those bits of the Parrot_Interp structure that aren't required to be
thread-specific (though I'm not sure there are any) I<PLUS> anything
pointed to that doesn't have to be thread-specific.

The environment includes the global namespaces, pads, stack chunks,
memory allocation regions, arenas, and whatnots. Just because the
pointer to the current pad is thread-specific doesn't mean the pad
I<itself> has to be. It can be shared.


A thread that has no contact I<AT ALL> with the internal data of any
other thread in the current process. Independent threads need no
synchronization for anything other than what few global things we
have. And the fewer the better, though alas we can't have none at all.

Note that independent threads may still communicate back and forth by
passing either atomic things (ints, floats, and pointers) or static
buffers that can become the property of the destination thread.


A thread that's part of a group of threads sharing a common
interpreter environment.



=head2 Supported Models

=over 4

=item POSIX threads

The threading scheme must be sufficient to support a POSIX
share-everything style of threading, such as is used in perl 5's
"pthread" model, as well as the thread models for Ruby and Python.

=item "Process-type' threads

The scheme must also support the perl 5 "iThreads" threading
model. In this model no data is shared implicitly, and all sharing
must be done on purpose and explicitly. It very much resembles the
Unix fork-process-with-shared-memory-segment model, not a surprise as
it was originally developed with


=head2 Guarantees

=over 4

=item No Crashes

The interpreter guarantees that no user program errors of any sort
will crash the interpreter. This includes threading problems. As
such, synchronization issues (where multiple interpreters are
accessing the same shared data) must not crash the interpreter or
corrupt its internal state.


=head2 Assumptions

=over 4

=item System memory allocation routines are threadsafe

We are assuming that the memory allocation system of the base OS is
threadsafe. While some of the C runtime libraries are notoriously
thread dangerous, memory allocation code generally is threadsafe, and
we'll assume that on all platforms. (Though we will, in general,
manage our own memory)


=head1 Proposal

The proposal is as follows:

=over 4

=item All global state shall be protected by mutexes

Straightforward enough. This allows for independent threads to
coexist without threatening the state of the proces.

=item Multiple independent interpreters will be allowed

Once again, straightforward enough. With threadsafe protected global
state, there's no issue here.

=item Only one OS thread in an interpreter at once

While there is no requirement that any interpreter be tied to an
underlying OS thread, under no circumstances may multiple OS threads
use a single interpreter simultaneously.

=item A Stop-and-copy communication method will be provided

Parrot will provide a function to make a call into another interpreter
and wait for that call to complete. This call may pass in data and
have data returned to it. The interpreter making the call will block
until the call is complete. The data passed in as parameters will be
copied into the called interpreter, and any return values will be
copied back into the calling interpreter. The called interpreter will
block while the return data is copied back into the calling interpreter.

=item Inter-interpreter events will be provided

Interpreters will be able to post events to other interpreters.

=item Each interpreter will have a unique id

This ID will be independent of the process or OS thread, and will be
constant across the lifetime of the interpreter. Interpreter IDs
I<may> be reused as interpreters are destroyed and recreated, and as
such are only guaranteed valid while an interpreter is in use.

(Note that we may decide to relax this requirement, but doing so
likely means moving to at least 64-bit integers to mark interpreter IDs)

=item Each interpreter show the same process id

All the interpreters within a process will share a process ID. On
those systems where each thread has its own unique ID (such as many
versions of Linux) Parrot will still report a single process ID for
all interpreters.

This process ID will be the ID of the process that first instantiated

=item Interpreter pools will share allocation pools

All the interpreters in an interpreter pool will share header and
memory allocation pools. This means that when there is more than one
interpreter in a pool the memory allocation and collection system
needs to be swapped out, as a copying collector is generally
untenable in a threaded environment.

As the allocation and collection system is a black box to user
programs and much of the interpreter internals, this isn't a big deal
outside of needing swappable allocation systems, the potential
issue of COW'd shared memory leaking, and the need to switch
allocation schemes mid-execution.

=item Each interpreter has a separate event queue

Some events, such as timers, may be interpreter-specific and, as
such, each interpreter has its own event queue.

=item Each interpreter pool has a shared event queue

Some events, such as IO callbacks, may not be interpreter-specific,
and can be serviced by any interpreter in the interpreter pool. For
these events, there is a pool-wide event queue.

=item PMCs are the coordination point for threads

That is, only PMCs are shared as such between threads. Strings,
specifically, are I<not> shared between interpreters as such

=item All PMCs shared amongst interpreters in a pool must be marked shared

A PMC which is not marked shared may not be handed to another
interpreter. Parrot will prevent this from happening either by
marking the PMC as shared, or throwing an exception when the PMC is
placed in a spot where it may be shared but is not shareable.

=item All shared PMCs must have a threadsafe vtable

The first thing that any vtable function of a shared PMC must do is to
aquire the mutex of the PMCs in its parameter list, in ascending
address order. When the mutexes are released they are not required to
be released in any order.

=item Automatic PMC sharing will be provided

When a PMC is placed into a container which is shared (including
lexical pads and global namespaces) then that PMC will automatically
be marked as shared. It is acceptable for this to trigger an
exception if for some reason a PMC should not be shared between

PMCs are, by default, not shared. This avoids sharing overhead for
PMCs which are only used as temporaries and not shared. (Note that
this is dangerous, and may end up not being done, due to the sharing
of continuations)

=item All interpreter constructs in a pool are shareable

This means that a PMC or string may be used by any interpreter in a
pool. It additionally means that, if full sharing is enabled, that
any interpreter in a pool may invoke a continuation, assuming the
continuation is valid. (That is, a continuation taken at parrot's top
level. Continuations taken within vtable functions, user-defined ops,
or extension code may not be shareable)

=item The embedding API will allow posting events to a pool

Many events are interpreter-specific, often caused by one particular
interpreter requesting an async event that later completes.

=item The embedding API will allow posting events to an interpreter

For events that don't have to go to any particular interpreter, they
can go into the pool's event loop.

=item The embedding API will allow calling a parrot sub with a pool

In those cases where there is an interpreter pool, embedders may call
a parrot sub using the pool as a whole, rather than an individual
interpreter, to run the sub. In that case Parrot may either choose a
dormant interpreter (if there is one) or create a new interpreter in
the pool to run the subroutine.

When the sub is done, Parrot may either cache the created
interpreter or destroy it as it needs to, though in no case will
Parrot ever leave a pool with no interpreters at all.



--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai                         have teddy bears and even
                                       teddy bears get drunk

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About