develooper Front page | perl.libwww | Postings from February 2001

considering HTML::Element's $tree->extract_links

Thread Next
Sean M. Burke
February 24, 2001 16:11
considering HTML::Element's $tree->extract_links
Message ID:
Some clever person wrote me earlier this month and suggested adding a
feature to HTML::Element's extract_links method; and I want to
run it past people who actually use the current method's behavior.

Currently this is that extract_links does:

Returns links found by traversing the element and all of its children
and looking for attributes (like "href" in an "a" element, or "src" in
an "img" element) whose values represent links.  The return value is a
reference to an array.  Each element of the array is reference to
an array with two items: the link-value and a the element that has the
attribute with that link-value.  You may or may not end up using the
element itself -- for some purposes, you may use only the link value.

You might specify that you want to extract links from just some kinds
of elements (instead of the default, which is to extract links from
all the kinds of elements known to have attributes whose values
represent links).  For instance, if you want to extract links from
only "a" and "img" elements, you could code it like this:

  for (@{  $e->extract_links('a', 'img')  }) {
      my($link, $element) = @$_;
        "Hey, there's a ", $element->tag,
        " that links to $link\n";

What the person who wrote to me suggested was this:  make each item
in the returned array contain not two subitems (attribute_value,
$element), but THREE: (attribute_value, $element, attribute_name).

I think this is a wonderful idea.

But I don't want to break old code.  Clearly all SANE old
code (as above) that I can imagine, would continue to work.
But something like this:

  for (@{  $e->extract_links('a', 'img')  }) {
     my($element) = $_->[-1];

would break completely, since the LAST (-1th) item of the
list is no longer the element.  Also, things like this
hypothetical bit of lunacy:
     %linkies = map reverse(@$_), @{$e->extract_links};
would break.

But neither of those breaky things are exactly brilliant code.

For anyone who uses extract_links, I'm asking:  would any of your code
break if I added a third value to each sublist returned?

Sean M. Burke

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About