develooper Front page | perl.libwww | Postings from July 2001

HTML::Parser - Extracting out the text from <body>

Thread Next
From:
Bill Moseley
Date:
July 2, 2001 11:17
Subject:
HTML::Parser - Extracting out the text from <body>
Message ID:
3.0.3.32.20010702111700.0251d130@pop3.hank.org
Hello,

I need to extract text out of html docs to do search word highlighting in
context.  (You know, like google's output.)

So, is there a "fastest" method to do this -- better than just using
HTML::Parser, setting a flag when I catch <body> and then storing the text?
(short of pre-processing the html documents?)

Thanks,

Bill Moseley
mailto:moseley@hank.org

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About