develooper Front page | perl.libwww | Postings from December 2001

Extracting Titles from Multiple URLs

Thread Next
Michael Bauer
December 4, 2001 00:13
Extracting Titles from Multiple URLs
Message ID:

Hi.  I'm trying to take a list of urls:

and get the titles.  Problem is, some of the urls don't exist.  Setting
the timeout low in User Agent from what I understand doesn't really apply
until after a connection is made and data is being processed, so I can
obviously wait quite a while to begin timing out for a non-existent url!
Here's the code I was using (copied from LWP examples):


use LWP::UserAgent;
use HTTP::Request; 
use HTTP::Response;
use URI::Heuristic;

$ifile = "$ARGV[0]\n";

open (I, "< $ifile") || print "can't open $ifile - $!\n";

while (<I>) {
    my $raw_url = $_;
    my $url = URI::Heuristic::uf_urlstr($raw_url);
    $| = 1;
    print $url."\t";
    my $ua = LWP::UserAgent->new();
    my $req = HTTP::Request->new(GET => $url);
    my $response = $ua->request($req);
    if ($response->is_error()) {
	print $response->status_line."\n";
    } else {
	my $title = $response->title();
	print $title."\n";

Should I embed this code into something that checks to see if the host can
be found via first before trying to get the title from a web site running
at the host?

Michael Bauer

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About