develooper Front page | perl.libwww | Postings from December 2000

TreeBuilder and non-standard tags

From:
Jose Quesada
Date:
December 27, 2000 11:14
Subject:
TreeBuilder and non-standard tags
Message ID:
3A4A429D.9849EC59@psych.colorado.edu
Hi,

I'm trying to get some text inside some non-standard tags like
<titulo></titulo> or <texto></texto> (a kind of rudimentary XML mixed up
with normal HTML code). I'm using HTML::TreeBuilder;

Reading the docs, I have made sure that:

 $root->implicit_body_p_tag(value) was set to true

 $root->ignore_unknown(value) was set to false

But it don't catch anything. What am I missing?

Thank you very much in advance.


Here is the example HTML and perl code used:
<b><titulo>Se presenta la obra p&oacute;stuma de Cruz Mart&iacute;nez
<!--TITULO##2:2--></font></titulo></b>
</p>
use HTML::TreeBuilder
sub get_heading {
                         my $tree = HTML::TreeBuilder->new;
    $implicit=$tree->implicit_tags();
    $ignoreoff=$tree->ignore_unknown(0);
    print "implicit $implicit\n";
    print "ignoreoff $ignoreoff\n";
                         $tree->parse_file($_[0]);
                         my $heading;
                         my @h1 = $tree->look_down('texto', '/texto') ;
    die"what, no titles here?" unless @h1;
                   for($i=0; $i<=$#h1; $i++){
    if ($h1[$i]) {
                             @result[$i] = $h1[$i]->as_text;
                         } else {
                             warn "No heading in $_[0]?";
                         }
      }#for
                         $tree->delete;
    print "results: @result\n";    # clear memory
                         return @result;

               }
# use it
@titleTree = get_heading($ARGV[0]);
$cont=join(' ',@titleTree);
print "$cont\n";

Thanks a lot,

Jose



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About