develooper Front page | perl.beginners | Postings from February 2002

Still can't extract data using HTML::TokeParser

Thread Next
Daniel Falkenberg
February 24, 2002 20:00
Still can't extract data using HTML::TokeParser
Message ID:
Hey all,

Just wondering why I still can't get HTML::TokeParser to either download
that page I am looking for or at least store the HTML from the requested
page.  I know I could quite easily do this if I used HTML::Tableextract
except the data I want is only about 3 lines of HTML and there are no
tables at all in there.  Therefore I cannot use HTML::TableExtract.  So
I was wondering how I would go about extract data from the following

<HTML><HEAD><TITLE>Get all data from H1</TITLE> </HEAD><BODY
BGCOLOR="FFFFFF"><h1>I want all if this data extracted from heading 1
(h1)</h1> </BODY></HTML>

So using the following code I figured it would be really simple to
extract the data I wanted?  Just a note that the pages I want will
change with different CGi parameters I parese to the reguested URL.
Does any one have any ideas?

use LWP::UserAgent;
use HTML::TableExtract;
use HTML::TreeBuilder;
use HTML::TokeParser;
use CGI qw(:all);
use CGI::Carp qw(fatalsToBrowser);

my $ua = LWP::UserAgent->new;

$inputSite = "<URL HERE>";
$address = "http://" . $inputSite;
$request = HTTP::Request->new('GET', $address);
$response = $ua->request($request);
my $found = 0;

my $content = $response->content;
$p = HTML::TokeParser->new($content) || die "Can't open: $!";
while ($stream->get_tag("h1")) { $data = get_trimmed_text("/h1");}



-----Original Message-----
From: Chris Ball []
Sent: Friday, 22 February 2002 9:49 PM
To: Daniel Falkenberg
Subject: Re: What would take care of this?...

>>>>> "Daniel" == Daniel Falkenberg <> writes:

    Daniel> Would I now have to go ahead and use HTML::parser or
    Daniel> something of similar nature to extract headings?

Yeah, go with HTML::TokeParser.

    Daniel> <HTML><HEAD><TITLE>Get all data from H1</TITLE> </HEAD><BODY
    Daniel> BGCOLOR="FFFFFF"><h1>I want all if this data extracted from
    Daniel> heading 1 (h1)</h1> </BODY></HTML>

while ($stream->get_tag("h1")) { $data = get_trimmed_text("/h1"); }

(Also see perldoc HTML::TokeParser, once it's installed.)

- Chris.
$a=""; Chris Ball | chris@void.$a | www.$a | finger: chris@$a
         "In the beginning there was nothing, which exploded."

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About