develooper Front page | perl.libwww | Postings from December 2000

Re: Bug in Marek::Pod::HTML

Sean M. Burke
December 4, 2000 10:33
Re: Bug in Marek::Pod::HTML
Message ID:
At 08:49 AM 2000-12-04 +0100, Marek Rouchal DAT CAD HW Tel 25849 wrote:
>In order to include raw HTML the user supplies with =for and =begin, I
>need to parse it with HTML::Treebuilder to turn it into nodes of type
>HTML::Element for inclusion in what Pod::HTML produces (namely a

That reminds me of a more general question:
I generally say that TreeBuilder is for parsing only whole documents -- in
the same sense that a hammer is for banging on things.  It's okay to try to
use TreeBuilder to parse document-fragments, the same as it's okay to try
to use a hammer as a foreceps -- but in both cases it will take some
improvisation and cleverness on your part.

But one thing I think might be helpful is a method I wrote, and keep
meaning to put in the next version of Element:

sub HTML::Element::highest_explicits {
  my(@stack) = ($_[0]);
  my @out;
  my $this;

  while(@stack) { # idiom for preorder traversal
      ref($this = shift @stack)
      and $this->{'_implicit'}
    ) {
      unshift @stack, @{$this->{'_content'} || next};
       # traverse it
    } else {
      push @out, $this; # and don't traverse under this
  return @out;

When you say
  $treelet->eof(); #don't forget to do this!
  @docfrag = $treelet->highest_explicits
you get the list of the highest non-implicit (=explicit) element nodes in
the tree.  It is possible to get really odd results out of this, but only
with nonsensical input code, I think.  (This might be followed by something
like:  for(@docfrag) { $_->detach if ref($_) }; )

Everything is happier, BTW, if input code is zero or more self-contained
elements (as opposed to ending on an incomplete element anywhere in there).

If you needed to see /whether/ that's the case, see what the $treelet->pos
is.  In theory, if it points to an explicit element, the source
code-fragment wasn't complete.  However, this would scream in the case of:


because the pos is still on the explicit p there.  So consider something
where you forgive left-open elements whose end tags are normally omissible,
like maybe:

  require HTML::Tagset;
  @up_pos = ($pos, $pos->lineage);
  my $saw_incomplete;
  foreach my $e (@up_pos) {
    ++$saw_incomplete unless $e->implicit
                      or $HTML::Tagset::optionalEndTag{$e->tag};
  die "GLEIVEN! GLAH $saw_incomplete!" if $saw_incomplete;

or maybe you could get away with just:

  require HTML::Tagset;
  @up_pos = ($pos, $pos->lineage);
  my $saw_incomplete;
  foreach my $e (@up_pos) {
    last if !$e->implicit;
    ++$saw_incomplete unless $HTML::Tagset::optionalEndTag{$e->tag};
  die "GLEIVEN! GLAH $saw_incomplete!" if $saw_incomplete;

I'm not sure there'd be a practical difference, assuming sane code.
But I'm not sure how it'd behave with mildly strange code, like any of:





But having to assume sane input is not too much of a problem -- assuming
that everyone agrees on what sanity is, and that it /is/ tested on some
examples of reasonably sane code.

Sean M. Burke Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About