develooper Front page | perl.beginners | Postings from January 2002


Thread Next
James Lucero
January 7, 2002 07:29
Message ID:
Has anyone successfully used the Parallel::ForkManager
to download web pages consistently?  Ultimately I need
to download hundreds/thousands of pages more
efficiently.  I am having two primary problems, 1) its
not very fast for me (although in testing I am only
using 3 or 4 urls so far, I can download many more
pages faster without it).  Also, the results of my
script are inconsistent, sometimes I get all the
requested pages, other times the script errors out  
before getting all the pages, using NT.  I suspect
that I am not properly using the �sub
wait_all_childs�.  I believe that the script quits
prematurely even though I have called the �wait� sub. 
I have inserted it in every place that I can think of
and its still inconsistent.  I�ve included an excerpt
of the code.  I will also try using ParallelUserAgent
to speed up my donwloads, once I get ForkManager to
work. James
use Parallel::ForkManager;
use LWP::Simple;
use LWP::UserAgent ;
use HTTP::Status ;
use HTTP::Request ;
%urls = 	( 	'drudge'=> '',
	  		'rush' 	=>
			'yahoo' => '',
			'cds' => '',);
foreach $myURL (sort(values(%urls))) 
print "Count is $count\n";
 $document =  DOCUMENT_RETRIEVER($myURL);
$mit = $myURL;
print "Commencing DOCUMENT_RETRIEVER number $iteration
for $mit\n";
print "Iteration is $iteration and Count is $count\n";

for  ($iteration = $count; $iteration <= $count;
$name = $iteration;
print "NAME $name\n" ;
my $pm=new Parallel::ForkManager(30);
$pm->start and next;
print "Starting Child Process $iteration for $mit\n" ;
	  $ua = LWP::UserAgent->new;
	  $ua->agent("$0/0.1 " . $ua->agent);
	  $req = new HTTP::Request 'GET' => "$mit";
	  $res = $ua->request($req, "$name.html"
	  	print "Process $iteration Complete\n" ;
 	  	print "Waiting on children\n";
	  		     undef $name;

Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About