Front page | perl.beginners |
Postings from April 2002
Re: use of HTML::Parser, HTML::FormatText
Thread Previous
|
Thread Next
From:
M z
Date:
April 4, 2002 23:46
Subject:
Re: use of HTML::Parser, HTML::FormatText
Message ID:
20020405074638.93853.qmail@web20905.mail.yahoo.com
drieux
I don't think I'm using this right...can you help
when I tried this little snippet on a basic html page
#!C:/perl/bin -w
use HTML::Tree;
use HTML::Tagset;
print "which one?: ";
chomp($in = <STDIN>);
open(X, "<$in") || die "can't read this! ($!)";
open(X1, ">wow");
$tree = HTML::Tree->new();
$tree->parse_file(<X>);
print X1 "$tree\n";
my output was
HTML::TreeBuilder=HASH(0x177f19c)
I think this may be a really silly question, but
please help!!!!!
--- drieux <drieux@wetware.com> wrote:
>
> On Sunday, March 31, 2002, at 11:50 , M z wrote:
>
> > hello,
> >
> > in conjunction, I was looking into this module
> HTML to
> > take out all the HTML I have in several files.
> > Namely, the data I want is between tags
> > <tag>data</tag>
>
> I would look at getting the HTML::TreeBuilder module
> - sounds
> like you need to get a copy of nmake - or find a ppm
> for installing
> these where they belong.
>
> As for code illustrations, try:
>
>
http://www.wetware.com/drieux/src/unix/perl/OK.UglyCode.txt
>
> an illustration of the full on wackaDoodle code,
> where I was
> working on an 'all singing, all dancing' - cgi and
> command line tool.
>
> you would want to look at the
>
> sub parseTreeBack {
>
> ....
>
> my $tree = HTML::TreeBuilder->new; # empty tree
> $tree->parse($res->content);
>
>
> my @title = $tree->look_down("_tag", "title");
>
> my $page = '';
>
> foreach my $t (@title) {
> foreach my $item_r ( $t->content_refs_list
> ) {
> next if ref $$item_r;
> $page .= "$$item_r \n";
> }
> $page .= "\n";
> }
>
> ....
> }
>
> that basic structure is how I get the 'content' of
> the 'title'
> out of the html....
>
> I repeat that basic trick set to parse out the rows
> and tables
> for other stuff - since I need to parse out of :
>
> " <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
> <HTML><HEAD><TITLE>List Grovellor Says</TITLE>
> </HEAD><BODY><H1 ALIGN="center">List Grovellor
> Says</H1><hr><TABLE
> WIDTH="60%" ALIGN="center"><TR VALIGN="TOP"><TH
> ALIGN="center" COLSPAN="2"
> > = Frodo found in hobbits =</TH></TR> <TR
> VALIGN="TOP"><TD>Frodo Baggins<
> /TD> <TD>frodo@shire.com</TD></TR></TABLE><br><hr
> align="center"
> width="50%"><br></BODY></HTML> "
>
> the fact that I found "Frodo" on the hobbits mailing
> lists, and
> that he has the email address frodo@shire.com -
>
>
> which is to say I found the TreeBuilder simpler to
> use than trying
> to work out the HTML::Parser and HTML::FormatText
> stuff directly,
> it provides some 'class extensions' - and the
> specific trick above
> is bootlegged from the POD. But it works.
>
> ciao
> drieux
>
> ---
>
>
> --
> To unsubscribe, e-mail:
> beginners-unsubscribe@perl.org
> For additional commands, e-mail:
> beginners-help@perl.org
>
__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/
Thread Previous
|
Thread Next