develooper Front page | perl.libwww | Postings from July 2001

Re: Conditional handling in HTML::Parser

Thread Next
Gisle Aas
July 10, 2001 11:17
Re: Conditional handling in HTML::Parser
Message ID:
Brent Baccala <> writes:

> I've got a set of scripts that alter HTML content (expected to be in
> spanish) by adding a link to every word that triggers a lookup in a
> spanish/english dictionary.  I use HTML::Parser.
> Anyway, I've come across some documents that don't parse right.  They
> appear to have been generated by Microsoft Office, and include tags like
> this:
> <![if !supportEmptyParas]>&nbsp;<![endif]>
> The "if" and "supportEmptyParas" end up getting flagged as text, even if
> I've called marked_sections(1)

This stuff does not follow the marked_sections syntax so I'm not
surprised.  As a marked section it would have to be expressed
something like:

  <![ &supportEmptyParams; [ &nbsp; ]]>

where &supportEmptyParams; expands to either "IGNORE" or "INCLUDE".

I don't know SGML well enough to tell if this is something worth
supporting or if this stuff is valid SGML at all.  Does anybody else

A simple hack to avoid this stuff might be to run something like
s/<!(if|endif)\[.*?\]>// on the text before feeding it to HTML::Parser.

> Since I don't really know SGML, I'm not sure how this should be handled,
> or even if it can be handled without having the Microsoft schema (which
> I can't find) available to be parsed.  Anyway, I thought I'd let you
> know.  The URL of the original document is:
> and the page for my scripts is:
> Thanks for your work with HTML::Parser, it's made this script fairly
> easy to write.

Good to hear!


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About