develooper Front page | perl.libwww | Postings from August 2003


Gisle Aas
August 14, 2003 23:13
Message ID:
FYI:  I just upload v3.29 of the HTML-Parser to CPAN.  These are the
changes since the last release:

     Setting xml_mode now implies strict_names also for end tags.

     Avoid warning from Visual C.  Patch by <>.

     64-bit fix from Doug Larrick <>

     Try to parse similar to Mozilla/MSIE in certain edge cases.
     All these are outside of the official definition of HTML but
     HTML spam often tries to take advantage of these.

       - New configuration attribute 'strict_end'.  Unless enabled
         we will allow end tags to contain extra words or stuff
         that look like attributes before the '>'.  This means that
         tags like these:

            </foo foo="<ignored>">
            </foo ignored>
            </foo ">" ignored>

         are now all parsed as a 'foo' end tag instead of text.
         Even if the extra stuff looks like attributes they will not
         be reported if requested via the 'attr' or 'tokens' argspecs
         for the 'end' handler.

       - Parse '</:comment>' and '</ comment>' as comments unless
         strict_comment is enabled.  Previous versions of the parser
         would report these as text.  If these comments contain
         quoted words prefixed by space or '=' these words can
         contain '>' without terminating the comment.
       - Parse '<! "<>" foo>' as comment containing ' "<>" foo'.
         Previous versions of the parser would terminate the comment
         at the first '>' and report the rest as text.

       - Legacy comment mode:  Parse with comments terminated with a
         lone '>' if no '-->' is found before eof.

       - Incomplete tag at eof is reported as a 'comment' instead
         of 'text' unless strict_comment is enabled.


Gisle Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About