A regexp range can by accident match more than was intended. For example, the regular expression /[a-zA-z]/ will match every lowercase and uppercase letters, but the same regular expression will also match the chars: [\]^_`.

On other occasions it can happen that the dash in a regular expression is not escaped, which will cause it to be interpreted as part of a range. For example in the character class [a-zA-Z0-9%=.,-_] the last character range matches the 55 characters between , and _ (both included), which overlaps with the range [0-9] and is thus clearly not intended.

Don't write character ranges were there might be confusion as to which characters are included in the range.

The following example code checks whether a string is a valid 6 digit hex color.

import re def is_valid_hex_color(color): return re.match(r'^#[0-9a-fA-f]{6}$', color) is not None

However, the A-f range matches every uppercase character, and thus a "color" like #XYZ is considered valid.

The fix is to use an uppercase A-F range instead.

import re def is_valid_hex_color(color): return re.match(r'^#[0-9a-fA-F]{6}$', color) is not None
  • Mitre.org: CWE-020
  • github.com: CVE-2021-42740
  • wh0.github.io: Exploiting CVE-2021-42740