develooper Front page | perl.libwww | Postings from April 2001

Re: HTML::Entities

Thread Next
From:
Gisle Aas
Date:
April 11, 2001 09:53
Subject:
Re: HTML::Entities
Message ID:
lrk84rs7it.fsf@ActiveState.com
Robin Berjon <robin@knowscape.com> writes:

> I bumped into a problem today using the HTML::Entities module. I'm dealing
> with some XHTML into which I insert hidden input fields, no rocket science
> there. In order to protect the content of the fields, I'm encoding them.
> The problem occurs because the XHTML uses ' (&apos;) as attribute value
> delimitres -- legal in XML -- but HTML::Entities doesn't encode those by
> default. In fact, it doesn't seem to know about &apos;

The reason HTML::Entities doesn't know about &apos; is that it's not
mentioned in the HTML specs:

   http://www.w3.org/TR/html4/charset.html#entities
   http://www.w3.org/TR/html4/sgml/entities.html

It is part of XHTML, because it is part of XML.

A quick test with some HTML browsers I had access to reveals:

   Netscape 4.76  don't know about it
   Netscape 6 does
   Konqueror 1.9.8 doesn't known about it
   Lynx 2.8.3 decoded it as &#96; instead of &#39;

Given this quick survey, I think it would be unwise to just add it to
HTML::Entities unless we can make it so that it only affects decoding.
It seems more correct to continue to encode ' as &#39;

> It's not a big problem for me as I know how to work around it, and I was
> inches away from submitting a patches, but I was wondering if there was a
> good reason why you hadn't included "'" in the list of default encoded
> characters ? I believe it belongs there with '"', the latter being in the
> list precisely because of attribute values, which can be delimited by both.

But HTML spec only mentions '"' so I think it makes sense to stick
with it for now.  Especially if we continue to encode ' as &#39;.

Regards,
Gisle

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About