develooper Front page | perl.beginners | Postings from April 2002

Re: use of HTML::Parser, HTML::FormatText

Thread Previous | Thread Next
From:
M z
Date:
April 4, 2002 23:46
Subject:
Re: use of HTML::Parser, HTML::FormatText
Message ID:
20020405074638.93853.qmail@web20905.mail.yahoo.com
drieux

I don't think I'm using this right...can you help

when I tried this little snippet on a basic html page


#!C:/perl/bin -w

use HTML::Tree;
use HTML::Tagset;

print "which one?: "; 
chomp($in = <STDIN>);
open(X, "<$in") || die "can't read this! ($!)";
open(X1, ">wow");

$tree = HTML::Tree->new();
$tree->parse_file(<X>);
print X1 "$tree\n";


my output was
HTML::TreeBuilder=HASH(0x177f19c)

I think this may be a really silly question, but
please help!!!!!



--- drieux <drieux@wetware.com> wrote:
> 
> On Sunday, March 31, 2002, at 11:50 , M z wrote:
> 
> > hello,
> >
> > in conjunction, I was looking into this module
> HTML to
> > take out all the HTML I have in several files.
> > Namely, the data I want is between tags
> > <tag>data</tag>
> 
> I would look at getting the HTML::TreeBuilder module
> - sounds
> like you need to get a copy of nmake - or find a ppm
> for installing
> these where they belong.
> 
> As for code illustrations, try:
> 
>
http://www.wetware.com/drieux/src/unix/perl/OK.UglyCode.txt
> 
> an illustration of the full on wackaDoodle code,
> where I was
> working on an 'all singing, all dancing' - cgi and
> command line tool.
> 
> you would want to look at the
> 
>   sub parseTreeBack {
> 
> 	....
> 
> 	my $tree = HTML::TreeBuilder->new; # empty tree
> 	$tree->parse($res->content);
> 
> 
>      my @title = $tree->look_down("_tag", "title");
> 
>      my $page = '';
> 
>      foreach my $t (@title) {
>          foreach my $item_r ( $t->content_refs_list
> ) {
>                  next if ref $$item_r;
>                  $page .=  "$$item_r \n";
>          }
>          $page .= "\n";
>      }
> 
>     ....
>   }
> 
> that basic structure is how I get the 'content' of
> the 'title'
> out of the html....
> 
> I repeat that basic trick set to parse out the rows
> and tables
> for other stuff - since I need to parse out of :
> 
> " <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
> <HTML><HEAD><TITLE>List Grovellor Says</TITLE>
> </HEAD><BODY><H1 ALIGN="center">List Grovellor
> Says</H1><hr><TABLE 
> WIDTH="60%" ALIGN="center"><TR VALIGN="TOP"><TH
> ALIGN="center" COLSPAN="2"
>  > = Frodo found in hobbits =</TH></TR> <TR
> VALIGN="TOP"><TD>Frodo Baggins<
> /TD> <TD>frodo@shire.com</TD></TR></TABLE><br><hr
> align="center" 
> width="50%"><br></BODY></HTML> "
> 
> the fact that I found "Frodo" on the hobbits mailing
> lists, and
> that he has the email address frodo@shire.com -
> 
> 
> which is to say I found the TreeBuilder simpler to
> use than trying
> to work out the HTML::Parser and HTML::FormatText
> stuff directly,
> it provides some 'class extensions' - and the
> specific trick above
> is bootlegged from the POD. But it works.
> 
> ciao
> drieux
> 
> ---
> 
> 
> -- 
> To unsubscribe, e-mail:
> beginners-unsubscribe@perl.org
> For additional commands, e-mail:
> beginners-help@perl.org
> 


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About