develooper Front page | perl.beginners | Postings from April 2008

Re: web scraping

Thread Previous
From:
Octavian Rasnita
Date:
April 28, 2008 13:53
Subject:
Re: web scraping
First search with search.cpan.org for "Finance" without quotes and see if
you can't find a module that downloads the data you want, and if you don't,
you can use LWP::UserAgent or WWW::Mechanize and regular expressions to do
it.

A very simple example that gets the title of Google's page:

use LWP::Simple;

my $content = get("http://www.google.com/");

my ($title) = $content =~ /<title[^>]*>(.*?)<\/title[^>]*>/gsi;

print $title;

Octavian

----- Original Message ----- 
From: "Rob Dixon" <rob.dixon@gmx.com>
To: <beginners@perl.org>
Cc: "Alex Goor" <a_goor@yahoo.com>
Sent: Monday, April 28, 2008 9:15 PM
Subject: Re: web scraping


> Alex Goor wrote:
>> I was hoping to write a simple program (if that's possible) to open a
>> browser, go to a site, and scrape a piece of information from that
>> site.
>>
>> For example, I was hoping to open a Safari of Firefox browser, go to
>> nyt.com and scrape the Dow Jones Industrial Average which is on the
>> homepage.
>>
>> Does anyone know where I could get an example program that does this
>> kind of thing to teach myself the concepts?
>
> Driving an actual Web browser is awkward and unnecessary unless the page
> you want cannot be handled with a Perl module.
>
> Take a look at WWW::Mechanize and see if it suits your purpose.
>
> Rob
>
> -- 
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>


Thread Previous


Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About