develooper Front page | perl.perl5.porters | Postings from October 1999

perlthread.pod -- First draft of thread docs

From:
Dan Sugalski
Date:
October 22, 1999 07:32
Subject:
perlthread.pod -- First draft of thread docs
Message ID:
3.0.6.32.19991022103610.00c711c0@tuatha.sidhe.org
Comments anyone?

-------Cut here with a sharp knife-----------
=head1 NAME

perlthread - Perl's thread interface

=head1 DESCRIPTION

=head2 Perl thread basics

When built with the appropriate options, perl provides facilities for
creating and managing threads. A thread is, more or less, a sort of
mini-process that executes entirely within the context of your main
process. Every thread in a process has access to all the resources of
any other thread in that same process.

This document describes the default threading environment provided by
the L<Thread> module that's provided with the core distribution. Other
interfaces and environments are possible.

Be aware that perl's thread model is not a direct implementation of
any of the extant models, though it is based on, and usually
implemented on top of, POSIX's threads. This means that any experience
you may have with anyone else's thread package isn't directly
applicable to perl. (For example, you won't find thread priorities or
mutexes in here)

There is no real hierarchy between threads, and with the exception of
the main thread, no thread is particularly special.

=head2 Creating threads

=over 4

=item new

To create a new thread, call Thread::new with a code reference as its
first parameter, with any parameters that you want passed into the new
thread. Perl creates a new thread, which then calls the subroutine
with the parameters you passed.

The return value is a thread object that represents the newly created
thread.

=item async

async takes a coderef and starts executing it. The most common way to
call async is with the code immediately following it, like eval:

     my $tid = async {
                        my $var = 1000;
		        while ($var--) {
                          sleep 1;
                        }
                      };

Note the trailing semi-colon after the block.

async is not exported by default--if you want it, you need to ask for
it when you use the Thread module. (If you don't, the code in the
block will be executed and its return value will be treated like an
object, and perl will try to call its async method. Odds are this'll
fail miserably)

=back

=head2 Locking

Perl code must never access a variable simultaneously in two or more
threads. The only way to safely do this is to lock the variable you're
accessing with the lock() function. lock() is advisory, in that it
only blocks other locks rather than actual access to a
variable. Locks are dynamically scoped, much like the way local works,
and stay locked until the lock goes out of scope.

For example:

    my $foo;
    {
        lock $foo; # $foo is now locked
	bar(1);	   # $foo stays locked even inside bar
    } # When execution gets here, the lock on $foo is released

Failure to lock variables will at best give wrong answers, at worst
cause perl to coredump.

The single exception is locking subroutines. If a thread locks a
subroutine, perl will prevent any other thread from entering it until
the lock is released.

Locking aggregates, such as hashes and arrays, is perfectly legal, and
will block (and be blocked by) locks on those arrays and
hashes. B<Locking an aggregate will not lock each member of the
aggregate!> This is very important to remember. The reason is
simple--the only safe way to do it would be to walk the hash or array
and lock each member separately, which is terribly
inefficient. (There's no way that locking one of the elements of an
aggregate could check the lock status of the entire aggregate, as an
individual scalar doesn't keep track of what aggregate it's in)

lock() will also do one free dereference. This sequence:

    {
        my ($foo, $bar);
	$foo = \$bar;
	lock $foo;
    }

will actually get the lock in $bar. This is done so locks taken on
objects (or, rather, object references, which is what you usually use)
will be handled properly. It's arguably a bug that the dereference is
unconditional, rather than just done on references to blessed
things. Don't, therefore, count on the automatic dereference working
on references to non-blessed things.

=head2 Reaping threads

When a thread finishes, perl maintains a structure that holds the
return values from the sub executed in the thread, as well as the
execption the thread may have thrown. This will hang around either
until another thread retrieves it or until the program exits.

=over 4

=item join

Call the join method on a thread object to wait for the thread to
finish and then retrieve its return values.

       @return = $thr->join();

If the thread threw an exception, that exception will be rethrown in
the joining thread.

=item eval

The eval method works just like the join method, except it catches any
exceptions the thread might have thrown.

	   @return = $thr->eval();
	   if ($@) {
	     print "Thread threw the error $@";
	   }

Like join, eval will block until the thread being eval'd finishes.

=item detach

Calling detach on a thread object marks it as unjoinable. Whenever the
thread finishes perl will automatically discard any return values it
might have returned and generally clean up after it.

Once detached, a thread can't be joined. Only detach those threads
that don't return any interesting information.

=back

=head2 Conditions and condition signalling

Perl provides a simple communication mechanism that allows threads to
coordinate their activities. 

=over 4

=item cond_wait

cond_wait() takes a locked variable as an argument and puts the thread
to sleep until another thread does a cond_signal() or cond_broadcast()
on that variable.

=item cond_signal

cond_signal() takes a locked variable as an argument and wakes up one
thread that is cond_wait()ing on it. If there are no threads in a
cond_wait() on the variable then the signal is discarded.

=item cond_broadcast

cond_broadcast() takes a locked variable as an argument and sends a
signal to all the threads cond_wait()ing on it.

=back

cond_wait() and cond_signal()/cond_broadcast() aren't, by themselves,
sufficient to implement reliable communications between threads. What
you need to do is use cond_signal() to indicate that the condition
variable has changed. The sequence in a thread that should wait should
always be

=over 4

=item open block

=item lock the coordinating variable

=item check the coordinating variable to see if it's in the proper state

=item cond_wait if it isn't

=back

for example,

    {
        lock $foo;
	cond_wait $foo unless $foo;
    }

The sequence that a singnalling thread goes through should be

=over 4

=item open block

=item lock the coordinating variable

=item set the condition

=item signal

=back

for example,

    {
        lock $foo;
	$foo = 12;
	cond_signal($foo);
    }

=head2 Thread utility functions

=over 4

=item tid

The tid() method returns the thread id for a thread object. A tid is
an integer that uniquely identifies a running thread. There are no
guarantees that a tid won't be reused in a single run of perl, though
that's currently not done.

=item list

list() returns a list of thread objects for all the currently running threads.

=item equal

equal() can either be called as a method:

    $thr1->equal($thr2);

or as a function

    equal($thr1, $thr2);

It returns true if the two thread objects refer to the same thread, or
false otherwise.

=item self

self() is a package method that returns a thread object that refers to
the current thread.

=item yield

yield() asks the underlying thread library to give another thread the
CPU. Whether this works, ans which thread gets time, depends on the
thread library.

=back

=head2 Threads and exceptions

In unthreaded perl, when a program throws an exception that's not
caught by an eval, the error message is sent to STDERR and the program
exits. The same thing happens in threaded perl if the main thread
throws an exception. If another thread throws an exception, however,
things are slightly different.

When a thread dies with an exception, it does B<not> do so
immediately. Instead, the exception is held until the thread is
joined, then thrown in the joining thread. The join may be evalled, of
course, and the exception caught. This 'held exception' rule holds
both for exceptions thrown with die() or croak() as well as for
exceptions that perl throws, such as division by zero errors and the
like.

=head2 Threads and signals

Threads and signals don't coexist particularly well in many
cases. Signals are, generally speaking, a process-level construct
while threads live one step below that. When a signal is thrown,
there's no good way to figure out which thread will receive it. Some
thread implementations deliver it to the main thread, some to the
currently running thread, some to the thread that threw it (if that's
even determinable), and some just punt.

This does make using alarm() to timeout I/O operations pretty much
impossible with threaded perl.

If you need establish signal handlers to catch process-wide signals
you should use the L<Thread::Signal> module to set up a
signal-handling thread.

=head2 Threads, forking, and exec

You must be careful when calling fork() and exec() in a threaded
program. Their behavior can be mildly surprising.

=over 0

=item fork

On platforms that support fork (currently Unix), when you fork a
threaded program, only the thread executing the fork runs in the new
child process. It doesn't matter how many threads were running in the
parent, the child has exactly one, the one that forked.

This can cause you difficulties if there were any mutexes locked by
other threads when the fork took place, as those other threads don't
exist in the child and therefore can't release any mutexes.

=item exec

When you call exec() on Unix or VMS it replaces the currently running
program with the program you exec()ed. This means that if you exec()
from any thread in a threaded perl program the entire process is
tossed out for the new one, not just the thread that executed the
exec().

This is different from the behaviour on Windows, where calling exec()
only replaces the thread calling exec(), leaving the other threads
untouched and running.

=back

=head2 Important things to remember about threads

=over 4

=item *

Don't exit the main thread with running child threads. This will
likely do Very Bad Things, which range from killing all the child
threads immediately, to locking up until all the child threads exit,
to segfaulting when the garbage collector goes into its global cleanup
phase on things still being used by child threads

=item *

Nothing in perl is atomic. A statement like

    $a++;

looks like it ought to execute atomically, but there's no guarantee of
that. Another thread accessing $a may change it between the time it's
fetched and the time it's incremented and set.

=item *

Uncoordinated access to perl data structures may cause perl to
segfault or otherwise die. Always lock your shared variables before
using them. Uncoordinated access will get you wrong answers some
times, so it's a bad idea anyway.

=back

=head1 AUTHOR

Dan Sugalski E<lt>dan@sidhe.orgE<gt>

=head1 SEE ALSO

L<perlthrtut>, L<Thread>, L<Thread::Queue>, L<Thread::Semaphore>



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About