Sanitizing untrusted input for HTML meta-characters is an important technique for preventing cross-site scripting attacks. But even a sanitized input can be dangerous to use if it is modified further before a browser treats it as HTML. A seemingly innocent transformation that expands a self-closing HTML tag from <div attr="{sanitized}"/> to <div attr="{sanitized}"></div> may in fact cause cross-site scripting vulnerabilities.

Use a well-tested sanitization library if at all possible, and avoid modifying sanitized values further before treating them as HTML.

The following function transforms a self-closing HTML tag to a pair of open/close tags. It does so for all non-img and non-area tags, by using a regular expression with two capture groups. The first capture group corresponds to the name of the tag, and the second capture group to the content of the tag.

While it is generally known regular expressions are ill-suited for parsing HTML, variants of this particular transformation pattern have long been considered safe.

However, the function is not safe. As an example, consider the following string:

When the above function transforms the string, it becomes a string that results in an alert when a browser treats it as HTML.

  • jQuery: Security fixes in jQuery 3.5.0
  • OWASP: DOM based XSS Prevention Cheat Sheet.
  • OWASP: XSS (Cross Site Scripting) Prevention Cheat Sheet.
  • OWASP Types of Cross-Site.
  • Wikipedia: Cross-site scripting.