develooper Front page | perl.libwww | Postings from October 2001

Re: HTML::Parser question

Thread Previous
Reinier Post
October 30, 2001 14:19
Re: HTML::Parser question
Message ID:
On Mon, Oct 29, 2001 at 10:00:23PM -0600, ADJE WebMail Technical Support Team wrote:
> Question: How do I extract the plain text from an HTML file, or, put
> another way, how do I remove the html markups, just leaving the plain
> text?  I have looked at the example provided in HTML::Parser, in
> particular
> HTML-Parser-3.25/eg/htext
> which comes close to what I need, however, I would like to store the
> plain text in a variable, as opposed to having it to STDOUT (standard
> output).... any ideas??


  perl -MLWP::Simple -MHTML::TreeBuilder \
    -e 'my $text =HTML::TreeBuilder->new' \
    -e '->parse(LWP::Simple::get("http://www/"))->as_text;' \
    -e 'print $text'

You probably want to improve on it.


Thread Previous Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About