develooper Front page | perl.libwww | Postings from March 2003

Trouble with HeadParser processing a malformed comment

From:
Rob Dixon
Date:
March 2, 2003 09:01
Subject:
Trouble with HeadParser processing a malformed comment
Message ID:
20030302170105.40229.qmail@onion.perl.org
Hi all.

I have been trying to automate the retrieval of data from a WWWsite
which happens to have a malformed comment in the HTML <head>
section. It looks like this:

<HTML>
<HEAD>
<! Created on 15/10/95 Amended by CH for Leicestershire etc
22/08/2001 ->

<TITLE>Library Catalogue</TITLE>
<META HTTP-EQUIV="REFRESH" content="2; URL=/www-bin/www_talis2">

</HEAD>

<BODY>
    :
    :

The comment declaration is correct, but the comment itself is
non-standard in that it is missing the leading and trailing pairs
of hyphens. Opera, Navigator and IE all handle this without
trouble, but HeadParser is blind to the <meta> information. It
produces the following debug trace:

START[html]
TEXT[
]
START[head]
TEXT[
]
TEXT[<! Created on 15/10/95 Amended by CH for Leicestershire etc
22/08/2001 ->

]
START[title]

which looks like it's starting OK as it has at least relegated
the comment to a category of 'text', and goes on to find the
starting tag for the title. Strangely though, it fails to find
anything after <title>,and skips the body text of the title
as well as the subsequent <meta> tag.

I am a little weak when it comes to DynaLoader and C
extensions, and I wondered if anybody had any thoughts
about this? I can achieve the desired result by setting

$ua->parse_head(0);

and then editing out the rogue comment and post-processing
explicitly with HeadParser, but it's not a solution that I like.

Thanks,

Rob






nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About