Security checks bypass due to a Unicode transformation

If security checks or logical validation is performed before unicode normalization, the security checks or logical validation could be bypassed due to a potential Unicode character collision. The validation we consider are: any character escaping, any regex validation, or any string manipulation (such as str.split).

Perform Unicode normalization before the logical validation.

The following example showcases the bypass of all checks performed by flask.escape() due to a post-unicode normalization.

For instance: the character U+FE64 () is not filtered-out by the flask escape function. But due to the Unicode normalization, the character is transformed and would become U+003C ( < ).

  • Research study: Unicode vulnerabilities that could bYte you and Unicode pentest cheatsheet.