develooper Front page | perl.libwww | Postings from March 2001

HTML-Parser-3.19_90

From:
Gisle Aas
Date:
March 13, 2001 13:46
Subject:
HTML-Parser-3.19_90
Message ID:
lrvgpd737g.fsf@caliper.ActiveState.com
HTML-Parser-3.19_90 should now be on CPAN.  This is kind of an alpha
release as it contains some new experimental features.

I have implemented internal filters to make it possible to reduce the
number of callbacks that the parser makes.  If you for instance are
just looking for certain links in an HTML-document you might only want
to have <a>-tags reported.  Having the parsers invoke a perl callback
routine that only test the given tagname and then return is quite
expensive.

The filtering is managed with the following 3 methods:

     $p->ignore_tags( TAG, ... )
     $p->report_only_tags( TAG, ... )
     $p->ignore_elements( TAG, ... ) 

Events for any tags registered with $p->ignore_tags() are always
suppressed.  If any tags are registered with $p->report_only_tags,
then events for any other tags are suppressed.  The
$p->ignore_elements() will also supporess events for tags and text
inside the registered elements.

Does this seem appropriate and useful?  Any other filtering methods
you can think of?

Next feature is that we allow '@attr' in the argspec.  This is similar
to 'attr' but key/value pairs are passed as individual arguments to
the callback.  Passing '@attr' is faster then passing 'attr' since no
new hash has to be built each time.  (In a little test I did I got 35%
speed improvement with a callback that did not do anything.)  This
also improve compatibility with XML::Parser style start events.

Then the whole argspec can now be wrapped up in @{...} to signal
flattening.  Only makes a difference when the target is an array.
A handler like this one:

   $p->handler(text => [], "@{dtext}");

is now much cheaper than the old:

   $p->handler(text => [], "dtext");

and it is easier to join together the collected string segments too.
A simple:

   $text = join("", @{$p->handler("text")});

should do.

Is this use of the '@' character in the argspec too confusing?  Perl
want to interpolate arrays if you use double quotes.  Can anybody
think of a better argspec syntax for this?

Regards,
Gisle



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About