Movable Type User Manual: SANITIZE: CLEANING UP INCOMING DATA

« Table of Contents


SANITIZE: CLEANING UP INCOMING DATA

Note: the following only applies if you have turned on the Allow HTML in Comments option in your weblog configuration preferences.

When data is submitted by visitors to your site, that data should not necessarily be trusted. If you are allowing HTML in your comments, for example, visitors to your site could submit malicious HTML, or scripts in Javascript or PHP, to run code on your site. This code could do anything from reading cookies to reading private files on your server.

To protect your site, Movable Type cleans up (``sanitizes'') any data submitted by visitors to your site. This includes any comment data and any TrackBack data. This cleanup is done to remove any code (HTML or otherwise) that could compromise the security of your site. The sanitization process works by only allowing certain HTML tags--any other tags, and all processing and scripting instructions (PHP, JSP, Javascript), are stripped.

The default set of allowed HTML tags and attributes is: a href, b, br, p, strong, em, ul, li, blockquote. You can override this setting globally by setting the GlobalSanitizeSpec setting in the mt.cfg file; and you can override it on a per-weblog basis in your weblog configuration. Note: Unless you know what you're doing, it is recommended that you stick with the defaults.

One other feature of the sanitization process is that it will add closing tags for any tags left open in the sanitized text. For example, if a visitor to your site opens a <b> tag and forgets the close it, the sanitize process will add a </b> tag.

Default Usage

By default, Sanitize is turned on automatically for the following tags:

This means that you don't need to modify your templates in order to make these tags safe. If you want to turn off sanitize for one of these tags, you can use the sanitize attribute:

<$MTPingTitle sanitize="0"$>

Overriding Defaults

To override the default sanitize specification, you should first make sure that you have a good reason for doing so. You should also make sure that you understand the format of the sanitize specification, described here.

The sanitize spec consists of HTML tag names separated by commas. For each tag, you must also list any attributes that you wish to allow, separated by spaces. Some examples:

This will allow a tags with the href attribute and b tags:

a href,b

This will allow p tags and br tags:

p,br/

Note the / in the br/ tag in this example. That is necessary because of the tag-closing feature mentioned above: if the parser sees only an opening <br> tag, it will think that it needs to close this tag at the end of the sanitized text. Adding the / after the tag name tells the parser that this tag does not need a closing tag.

Note that you must specify any allowed attributes for the tag, unless you want all of the attributes to be stripped. For example, if you allow the a tag, you would also want to allow the href attribute for that tag, or the following HTML:

<a href="http://www.foo.com/">

would be turned into this:

<a>

which probably isn't what you want.

If you wish to allow a certain attribute for any HTML tag in which it might appear, use a * as the tag name, followed by the list of attributes. For example:

br/,p,blockquote,* style

This will allow any of the following:

<br style="..." />
<p style="..." />
<blockquote style="...">

Note that you must still explicitly list any tags that you want included; * just allows the attribute listed in any of those tags.


Copyright © 2001, 2002 Ben Trott and Mena Trott. All Rights Reserved.