develooper Front page | perl.beginners | Postings from March 2002

Re: use of HTML::Parser, HTML::FormatText

Thread Previous | Thread Next
March 31, 2002 12:15
Re: use of HTML::Parser, HTML::FormatText
Message ID:

On Sunday, March 31, 2002, at 11:50 , M z wrote:

> hello,
> in conjunction, I was looking into this module HTML to
> take out all the HTML I have in several files.
> Namely, the data I want is between tags
> <tag>data</tag>

I would look at getting the HTML::TreeBuilder module - sounds
like you need to get a copy of nmake - or find a ppm for installing
these where they belong.

As for code illustrations, try:

an illustration of the full on wackaDoodle code, where I was
working on an 'all singing, all dancing' - cgi and command line tool.

you would want to look at the

  sub parseTreeBack {


	my $tree = HTML::TreeBuilder->new; # empty tree

     my @title = $tree->look_down("_tag", "title");

     my $page = '';

     foreach my $t (@title) {
         foreach my $item_r ( $t->content_refs_list ) {
                 next if ref $$item_r;
                 $page .=  "$$item_r \n";
         $page .= "\n";


that basic structure is how I get the 'content' of the 'title'
out of the html....

I repeat that basic trick set to parse out the rows and tables
for other stuff - since I need to parse out of :

<HTML><HEAD><TITLE>List Grovellor Says</TITLE>
</HEAD><BODY><H1 ALIGN="center">List Grovellor Says</H1><hr><TABLE 
WIDTH="60%" ALIGN="center"><TR VALIGN="TOP"><TH ALIGN="center" COLSPAN="2"
 > = Frodo found in hobbits =</TH></TR> <TR VALIGN="TOP"><TD>Frodo Baggins<
/TD> <TD></TD></TR></TABLE><br><hr align="center" 
width="50%"><br></BODY></HTML> "

the fact that I found "Frodo" on the hobbits mailing lists, and
that he has the email address -

which is to say I found the TreeBuilder simpler to use than trying
to work out the HTML::Parser and HTML::FormatText stuff directly,
it provides some 'class extensions' - and the specific trick above
is bootlegged from the POD. But it works.



Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About