Front page | perl.libwww |
Postings from August 2003
From: Gisle Aas
August 14, 2003 23:13
Message ID: firstname.lastname@example.org
FYI: I just upload v3.29 of the HTML-Parser to CPAN. These are the
changes since the last release:
Setting xml_mode now implies strict_names also for end tags.
Avoid warning from Visual C. Patch by <email@example.com>.
64-bit fix from Doug Larrick <firstname.lastname@example.org>
Try to parse similar to Mozilla/MSIE in certain edge cases.
All these are outside of the official definition of HTML but
HTML spam often tries to take advantage of these.
- New configuration attribute 'strict_end'. Unless enabled
we will allow end tags to contain extra words or stuff
that look like attributes before the '>'. This means that
tags like these:
</foo ">" ignored>
are now all parsed as a 'foo' end tag instead of text.
Even if the extra stuff looks like attributes they will not
be reported if requested via the 'attr' or 'tokens' argspecs
for the 'end' handler.
- Parse '</:comment>' and '</ comment>' as comments unless
strict_comment is enabled. Previous versions of the parser
would report these as text. If these comments contain
quoted words prefixed by space or '=' these words can
contain '>' without terminating the comment.
- Parse '<! "<>" foo>' as comment containing ' "<>" foo'.
Previous versions of the parser would terminate the comment
at the first '>' and report the rest as text.
- Legacy comment mode: Parse with comments terminated with a
lone '>' if no '-->' is found before eof.
- Incomplete tag at eof is reported as a 'comment' instead
of 'text' unless strict_comment is enabled.
by Gisle Aas