Front page | perl.beginners |
Postings from January 2002
Parallel::ForkManager
Thread Next
From:
James Lucero
Date:
January 7, 2002 07:29
Subject:
Parallel::ForkManager
Message ID:
20020107152655.21248.qmail@web10902.mail.yahoo.com
Has anyone successfully used the Parallel::ForkManager
to download web pages consistently? Ultimately I need
to download hundreds/thousands of pages more
efficiently. I am having two primary problems, 1) its
not very fast for me (although in testing I am only
using 3 or 4 urls so far, I can download many more
pages faster without it). Also, the results of my
script are inconsistent, sometimes I get all the
requested pages, other times the script errors out
before getting all the pages, using NT. I suspect
that I am not properly using the �sub
wait_all_childs�. I believe that the script quits
prematurely even though I have called the �wait� sub.
I have inserted it in every place that I can think of
and its still inconsistent. I�ve included an excerpt
of the code. I will also try using ParallelUserAgent
to speed up my donwloads, once I get ForkManager to
work. James
########################
use Parallel::ForkManager;
use LWP::Simple;
use LWP::UserAgent ;
use HTTP::Status ;
use HTTP::Request ;
%urls = ( 'drudge'=> 'http://www.drudgereport.com',
'rush' =>
'http://www.rushlimbaugh.com/home/today.guest.html',
'yahoo' => 'http://www.yahoo.com',
'cds' => 'http://www.cdsllc.com/',);
foreach $myURL (sort(values(%urls)))
{
$count++;
print "Count is $count\n";
$document = DOCUMENT_RETRIEVER($myURL);
}
sub DOCUMENT_RETRIEVER
{
$myURL=$_[0];
$mit = $myURL;
print "Commencing DOCUMENT_RETRIEVER number $iteration
for $mit\n";
print "Iteration is $iteration and Count is $count\n";
for ($iteration = $count; $iteration <= $count;
$iteration++)
{
$name = $iteration;
print "NAME $name\n" ;
my $pm=new Parallel::ForkManager(30);
$pm->start and next;
print "Starting Child Process $iteration for $mit\n" ;
$ua = LWP::UserAgent->new;
$ua->agent("$0/0.1 " . $ua->agent);
$req = new HTTP::Request 'GET' => "$mit";
$res = $ua->request($req, "$name.html"
print "Process $iteration Complete\n" ;
$pm->finish;
$pm->wait_all_childs;
print "Waiting on children\n";
}
undef $name;
}
__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/
Thread Next
-
Parallel::ForkManager
by James Lucero