/** * @name Incomplete regular expression for hostnames * @description Matching a URL or hostname against a regular expression that contains an unescaped dot as part of the hostname might match more hostnames than expected. * @kind problem * @problem.severity warning * @security-severity 5.9 * @precision high * @id py/incomplete-hostname-regexp * @tags correctness * security * external/cwe/cwe-20 */ import python import semmle.python.regex private string commonTopLevelDomainRegex() { result = "com|org|edu|gov|uk|net|io" } /** * Holds if `pattern` is a regular expression pattern for URLs with a host matched by `hostPart`, * and `pattern` contains a subtle mistake that allows it to match unexpected hosts. */ bindingset[pattern] predicate isIncompleteHostNameRegExpPattern(string pattern, string hostPart) { hostPart = pattern .regexpCapture("(?i).*" + // an unescaped single `.` "(?