develooper Front page | perl.macosx | Postings from July 2003

Re: Panther, Perl 5.8.*, threads, etc.

Thread Previous | Thread Next
From:
Dan Sugalski
Date:
July 17, 2003 07:28
Subject:
Re: Panther, Perl 5.8.*, threads, etc.
Message ID:
a05210600bb3c59373ef0@[63.120.19.221]
At 6:04 PM -0400 7/16/03, Shawn Corey wrote:
>Hi,
>
>As far as I know, fork re-uses the same program image; that's what 
>the sticky bit is all about (see man 2 chmod).

No. The sticky bit marks the executable image as worthy of serious 
caching, but that's separate from fork. Don't forget that some of the 
pages in an executable image don't ever appear directly in the 
processes that invoke them, as there is potentially writable data 
needing copying in and zeroed pages to be allocated. Processes often 
map in readonly pages from cached images when those images are 
invoked, but that has nothing to do with fork. (Though it has rather 
a lot to do with exec)

>  It does re-create the data image for an new program.

No. No, it doesn't. Or at least, on systems with copy-on-write 
provided via hardware memory protection, it doesn't have to.

On systems without COW provided in some form, generally a full memory 
copy is done when the forked process is created. Memory pages tagged 
as readonly (possibly executable and readonly, depends on the system) 
will not get copied (except in really, really old fork-capable 
systems) and since this is where most of the executable code lives in 
most processes that doesn't get copied. Non-readonly pages, usually 
data, will get copied as part of the fork.

In a system with sufficient hardware support, when a new process is 
created via fork all that happens is the system marks all the 
writable PTEs (page table entries, the things the hardware memory 
management unit uses to handle virtual-to-real memory address 
remapping and memory protection) in the parent process as readonly 
and copy-on-writable, and then copies the PTEs into the child 
process. The memory pages themselves are *not* copied, just the PTEs. 
When either the child or the parent then writes to a protected page 
that page is then copied to a new segment of memory and the PTE for 
that page is updated and the readonly and COW markers are removed. 
This is generally (though *not* always) less expensive than the full 
copy method, as most of the writable pages in most process don't 
actually get touched after forking.

Where copy-on-fork vs COW breaks for performance is system dependent, 
as it depends on the support the MMU provides (since without 
refcounts in the PTEs both the parent and child process potentially 
have to copy a COW'd page they write to) and how expensive interrupt 
handling is, since cloning a COW page normally requires at least some 
minimal amount of OS intervention, to locate a free page to copy to 
if nothing else.

>This leads to the confusion I've been having; how can you create a 
>thread that's not perl? Perl (OK, advance perl implementations) 
>allows threads but these threads must be within the same perl 
>program. A thread that runs another program/script is a fork. A 
>thread runs the same program image with the same data image. Forks 
>(processes) run a different program image and a different 
>(necessary) data image. I see no advantage in creating a thread that 
>loads a different process. Calling fork() and eval() is more 
>understandable than threading then eval(). Could someone clear up my 
>confusion?

You seem to have some fundamental confusion over what's going on with 
threads vs forks. (At least in current user-level OSes--it's all 
different in the embedded and research space) I've no doubt what I 
wrote above won't help that. :)

A thread is just a point of execution (with all the associated bits) 
in a process. Multiple threads mean you have multiple execution 
points simultaneously in the same process. There's only one "program" 
loaded into a process, though it can certainly have chunks of code 
that are essentially independent and thus simulate multiple programs, 
but this isn't any different from your program having a sub that acts 
as if the rest of the program doesn't exist.

A fork, on the other hand, creates an entirely new, separate 
process--it does *not*, however, have to run a different program. 
(Neither is there anything stopping a thread from running a separate 
program, but since doing this blows away all the threads in a process 
and starts fresh(ish) it's generally not done, as it's awfully 
drastic) Unless something's actively done it *won't* run a different 
program.

>On Wednesday, July 16, 2003, at 04:37  PM, Dan Sugalski wrote:
>
>>At 1:15 PM -0700 7/16/03, Rich Morin wrote:
>>>At 8:33 PM +0100 7/16/03, David Cantrell wrote:
>>>>As far as the program is concerned, it's a complete copy.  But yes,
>>>>most modern virtual memory implementations will, I believe, do copy
>>>>on write.  I haven't actually tested this on OS X though :-)
>>>
>>>OK, I'm curious; how _would_ one go about testing this?
>>
>>The easiest way to do so is to snag the Darwin source and take a 
>>look at some of the low-level MMU manipulation code in the kernel. 
>>It should be pretty obvious whether (though not necessarily how :) 
>>it's done.

-- 
                                         Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
dan@sidhe.org                         have teddy bears and even
                                       teddy bears get drunk

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About