Front page | perl.macosx |
Postings from July 2003
Re: Panther, Perl 5.8.*, threads, etc.
From: Dan Sugalski
July 17, 2003 07:28
Re: Panther, Perl 5.8.*, threads, etc.
Message ID: firstname.lastname@example.org
At 6:04 PM -0400 7/16/03, Shawn Corey wrote:
>As far as I know, fork re-uses the same program image; that's what
>the sticky bit is all about (see man 2 chmod).
No. The sticky bit marks the executable image as worthy of serious
caching, but that's separate from fork. Don't forget that some of the
pages in an executable image don't ever appear directly in the
processes that invoke them, as there is potentially writable data
needing copying in and zeroed pages to be allocated. Processes often
map in readonly pages from cached images when those images are
invoked, but that has nothing to do with fork. (Though it has rather
a lot to do with exec)
> It does re-create the data image for an new program.
No. No, it doesn't. Or at least, on systems with copy-on-write
provided via hardware memory protection, it doesn't have to.
On systems without COW provided in some form, generally a full memory
copy is done when the forked process is created. Memory pages tagged
as readonly (possibly executable and readonly, depends on the system)
will not get copied (except in really, really old fork-capable
systems) and since this is where most of the executable code lives in
most processes that doesn't get copied. Non-readonly pages, usually
data, will get copied as part of the fork.
In a system with sufficient hardware support, when a new process is
created via fork all that happens is the system marks all the
writable PTEs (page table entries, the things the hardware memory
management unit uses to handle virtual-to-real memory address
remapping and memory protection) in the parent process as readonly
and copy-on-writable, and then copies the PTEs into the child
process. The memory pages themselves are *not* copied, just the PTEs.
When either the child or the parent then writes to a protected page
that page is then copied to a new segment of memory and the PTE for
that page is updated and the readonly and COW markers are removed.
This is generally (though *not* always) less expensive than the full
copy method, as most of the writable pages in most process don't
actually get touched after forking.
Where copy-on-fork vs COW breaks for performance is system dependent,
as it depends on the support the MMU provides (since without
refcounts in the PTEs both the parent and child process potentially
have to copy a COW'd page they write to) and how expensive interrupt
handling is, since cloning a COW page normally requires at least some
minimal amount of OS intervention, to locate a free page to copy to
if nothing else.
>This leads to the confusion I've been having; how can you create a
>thread that's not perl? Perl (OK, advance perl implementations)
>allows threads but these threads must be within the same perl
>program. A thread that runs another program/script is a fork. A
>thread runs the same program image with the same data image. Forks
>(processes) run a different program image and a different
>(necessary) data image. I see no advantage in creating a thread that
>loads a different process. Calling fork() and eval() is more
>understandable than threading then eval(). Could someone clear up my
You seem to have some fundamental confusion over what's going on with
threads vs forks. (At least in current user-level OSes--it's all
different in the embedded and research space) I've no doubt what I
wrote above won't help that. :)
A thread is just a point of execution (with all the associated bits)
in a process. Multiple threads mean you have multiple execution
points simultaneously in the same process. There's only one "program"
loaded into a process, though it can certainly have chunks of code
that are essentially independent and thus simulate multiple programs,
but this isn't any different from your program having a sub that acts
as if the rest of the program doesn't exist.
A fork, on the other hand, creates an entirely new, separate
process--it does *not*, however, have to run a different program.
(Neither is there anything stopping a thread from running a separate
program, but since doing this blows away all the threads in a process
and starts fresh(ish) it's generally not done, as it's awfully
drastic) Unless something's actively done it *won't* run a different
>On Wednesday, July 16, 2003, at 04:37 PM, Dan Sugalski wrote:
>>At 1:15 PM -0700 7/16/03, Rich Morin wrote:
>>>At 8:33 PM +0100 7/16/03, David Cantrell wrote:
>>>>As far as the program is concerned, it's a complete copy. But yes,
>>>>most modern virtual memory implementations will, I believe, do copy
>>>>on write. I haven't actually tested this on OS X though :-)
>>>OK, I'm curious; how _would_ one go about testing this?
>>The easiest way to do so is to snag the Darwin source and take a
>>look at some of the low-level MMU manipulation code in the kernel.
>>It should be pretty obvious whether (though not necessarily how :)
--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
email@example.com have teddy bears and even
teddy bears get drunk