develooper Front page | perl.fwp | Postings from May 2003

Re: Stripping HTML (angle brackets are not enough)

Thread Next
May 20, 2003 15:24
Re: Stripping HTML (angle brackets are not enough)
Message ID:

See also

some notes from that page:

Encoding 'special' characters is better than stripping them because it 
avoids loss of information. Encoding everything but known non-special 
characters is safer because it makes it harder to accidentally leave a 
special character unescaped.

Angle brackets are not the only special characters.

Even if you think you have a full list of special characters, you may be 
wrong, particularly if you fail to specify a character encoding in your 
output HTML.

The "escape everything but known safe characters" approach is taken by 
modules like URI::Escape, and works well. Unfortunately, this is not the 
approach taken by &CGI::escapeHTML so you may want to use 
&HTML::Entities::encode_entities or roll your own instead.

I've read at another URL (I no longer recall what it was) that some 
browsers will interpret the chevron-type characters as acceptable 
substitutes for angle brackets. This is yet another reason to escape 
anything other than those specific characters you want to allow.

...and if the content will be stuck in a document which will be 
processed on the server side, be sure that no server-side directives are 
allowed through. Standard escaping should fix canonical server side 
includes, but if you're using Mason (with non-default tags) then you may 
have more characters to escape.

When it doubt it's better to escape more characters than fewer. The hex 
or decimal encoded entity for 'a' will still render as an 'a' character.


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About