develooper Front page | perl.libwww | Postings from December 2000

TreeBuilder and non-standard tags

Jose Quesada
December 27, 2000 11:14
TreeBuilder and non-standard tags
Message ID:

I'm trying to get some text inside some non-standard tags like
<titulo></titulo> or <texto></texto> (a kind of rudimentary XML mixed up
with normal HTML code). I'm using HTML::TreeBuilder;

Reading the docs, I have made sure that:

 $root->implicit_body_p_tag(value) was set to true

 $root->ignore_unknown(value) was set to false

But it don't catch anything. What am I missing?

Thank you very much in advance.

Here is the example HTML and perl code used:
<b><titulo>Se presenta la obra p&oacute;stuma de Cruz Mart&iacute;nez
use HTML::TreeBuilder
sub get_heading {
                         my $tree = HTML::TreeBuilder->new;
    print "implicit $implicit\n";
    print "ignoreoff $ignoreoff\n";
                         my $heading;
                         my @h1 = $tree->look_down('texto', '/texto') ;
    die"what, no titles here?" unless @h1;
                   for($i=0; $i<=$#h1; $i++){
    if ($h1[$i]) {
                             @result[$i] = $h1[$i]->as_text;
                         } else {
                             warn "No heading in $_[0]?";
    print "results: @result\n";    # clear memory
                         return @result;

# use it
@titleTree = get_heading($ARGV[0]);
$cont=join(' ',@titleTree);
print "$cont\n";

Thanks a lot,

Jose Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About