{"meta":{"title":"database create","intro":"Create a CodeQL database for a source tree that can be analyzed using\none of the CodeQL products.","product":"Security and code quality","breadcrumbs":[{"href":"/en/code-security","title":"Security and code quality"},{"href":"/en/code-security/reference","title":"Reference"},{"href":"/en/code-security/reference/code-scanning","title":"Code scanning"},{"href":"/en/code-security/reference/code-scanning/codeql","title":"CodeQL"},{"href":"/en/code-security/reference/code-scanning/codeql/codeql-cli-manual","title":"CodeQL CLI manual"},{"href":"/en/code-security/reference/code-scanning/codeql/codeql-cli-manual/database-create","title":"database create"}],"documentType":"article"},"body":"# database create\n\nCreate a CodeQL database for a source tree that can be analyzed using\none of the CodeQL products.\n\n> \\[!NOTE]\n> This content describes the most recent release of the CodeQL CLI. For more information about this release, see <https://github.com/github/codeql-cli-binaries/releases>.\n>\n> To see details of the options available for this command in an earlier release, run the command with the <span style=\"white-space: nowrap;\">`--help`</span> option in your terminal.\n\n## Synopsis\n\n```shell copy\ncodeql database create [--language=<lang>[,<lang>...]] [--github-auth-stdin] [--github-url=<url>] [--source-root=<dir>] [--threads=<num>] [--ram=<MB>] [--command=<command>] [--extractor-option=<extractor-option-name=value>] <options>... -- <database>\n```\n\n## Description\n\nCreate a CodeQL database for a source tree that can be analyzed using\none of the CodeQL products.\n\n## Options\n\n### Primary Options\n\n#### `<database>`\n\n\\[Mandatory] Path to the CodeQL database to create. This directory will\nbe created, and *must not* already exist (but its parent must).\n\nIf the `--db-cluster` option is given, this will not be a database\nitself, but a directory that will *contain* databases for several\nlanguages built from the same source root.\n\nIt is important that this directory is not in a location that the build\nprocess will interfere with. For instance, the `target` directory of a\nMaven project would not be a suitable choice.\n\n#### `--[no-]overwrite`\n\n\\[Advanced] If the database already exists, delete it and proceed with\nthis command instead of failing. If the directory exists, but it does\nnot look like a database, an error will be thrown.\n\n#### `--[no-]force-overwrite`\n\n\\[Advanced] If the database already exists, delete it even if it does\nnot look like a database and proceed with this command instead of\nfailing. This option should be used with caution as it may recursively\ndelete the entire database directory.\n\n#### `--codescanning-config=<file>`\n\n\\[Advanced] Read a Code Scanning configuration file specifying options\non how to create the CodeQL databases and what queries to run in later\nsteps. For more details on the format of this configuration file, refer\nto [Workflow configuration options for code scanning](/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning). To run queries from\nthis file in a later step, invoke [codeql database analyze](/en/code-security/reference/code-scanning/codeql/codeql-cli-manual/database-analyze) without any other queries specified.\n\n#### `--[no-]db-cluster`\n\nInstead of creating a single database, create a \"cluster\" of databases\nfor different languages, each of which is a subdirectory of the\ndirectory given on the command line.\n\n#### `-l, --language=<lang>[,<lang>...]`\n\nThe language that the new database will be used to analyze.\n\nUse [codeql resolve languages](/en/code-security/reference/code-scanning/codeql/codeql-cli-manual/resolve-languages) to get a list of the pluggable language extractors found on the search path.\n\nWhen the `--db-cluster` option is given, this can appear multiple times,\nor the value can be a comma-separated list of languages.\n\nIf this option is omitted, and the source root being analysed is a\ncheckout of a GitHub repository, the CodeQL CLI will make a call to the\nGitHub API to attempt to automatically determine what languages to\nanalyse. Note that to be able to do this, a GitHub PAT token must be\nsupplied either in the environment variable GITHUB\\_TOKEN or via standard\ninput using the `--github-auth-stdin` option.\n\n#### `--build-mode=<mode>`\n\nThe build mode that will be used to create the database.\n\nChoose your build mode based on the language you are analyzing:\n\n`none`: The database will be created without building the source root.\nAvailable for C#, Java, JavaScript/TypeScript, Python, and Ruby.\n\n`autobuild`: The database will be created by attempting to automatically\nbuild the source root. Available for C/C++, C#, Go, Java/Kotlin, and\nSwift.\n\n`manual`: The database will be created by building the source root using\na manually specified build command. Available for C/C++, C#, Go,\nJava/Kotlin, and Swift.\n\nWhen creating a database with `--command`, there is no need to\nadditionally specify '--build-mode manual'.\n\nAvailable since `v2.16.4`.\n\n#### `-s, --source-root=<dir>`\n\n\\[Default: .] The root source code directory. In many cases, this will\nbe the checkout root. Files within it are considered to be the primary\nsource files for this database. In some output formats, files will be\nreferred to by their relative path from this directory.\n\n#### `-j, --threads=<num>`\n\nUse this many threads for the import operation, and pass it as a hint to\nany invoked build commands.\n\nDefaults to 1. You can pass 0 to use one thread per core on the machine,\nor -*N* to leave *N* cores unused (except still use at least one\nthread).\n\n#### `-M, --ram=<MB>`\n\nUse this much memory for the import operation, and pass it as a hint to\nany invoked build commands.\n\n#### `-c, --command=<command>`\n\nFor compiled languages, build commands that will cause the compiler to\nbe invoked on the source code to analyze. These commands will be\nexecuted under an instrumentation environment that allows analysis of\ngenerated code and (in some cases) standard libraries.\n\nIf no build command is specified, the command attempts to figure out\nautomatically how to build the source tree, based on heuristics from the\nselected language pack.\n\nBeware that some combinations of multiple languages *require* an\nexplicit build command to be specified.\n\n#### `--no-cleanup`\n\n\\[Advanced] Suppress all database cleanup after finalization. Useful\nfor debugging purposes.\n\n#### `--no-pre-finalize`\n\n\\[Advanced] Skip any pre-finalize script specified by the active CodeQL\nextractor.\n\n#### `--[no-]skip-empty`\n\n\\[Advanced] Output a warning instead of failing if a database is empty\nbecause no source code was seen during the build. The empty database\nwill be left unfinalized.\n\n#### `--[no-]linkage-aware-import`\n\n\\[Advanced] Controls whether [codeql dataset import](/en/code-security/reference/code-scanning/codeql/codeql-cli-manual/dataset-import) is linkage-aware *(default)* or not. On projects where this part of database creation\nconsumes too much memory, disabling this option may help them progress\nat the expense of database completeness.\n\nAvailable since `v2.15.3`.\n\n### Baseline calculation options\n\n#### `--[no-]calculate-baseline`\n\n\\[Advanced] Calculate baseline information about the code being\nanalyzed and add it to the database. By default, this is enabled unless\nthe source root is the root of a filesystem. This flag can be used to\neither disable, or force the behavior to be enabled even in the root of\nthe filesystem.\n\n#### `--[no-]sublanguage-file-coverage`\n\n\\[GitHub.com and GitHub Enterprise Server v3.12.0+ only] Use\nsub-language file coverage information. This calculates, displays, and\nexports separate file coverage information for languages which share a\nCodeQL extractor like C and C++, Java and Kotlin, and JavaScript and\nTypeScript.\n\nAvailable since `v2.15.2`.\n\n### Extractor selection options\n\n#### `--search-path=<dir>[:<dir>...]`\n\nA list of directories under which extractor packs may be found. The\ndirectories can either be the extractor packs themselves or directories\nthat contain extractors as immediate subdirectories.\n\nIf the path contains multiple directory trees, their order defines\nprecedence between them: if the target language is matched in more than\none of the directory trees, the one given first wins.\n\nThe extractors bundled with the CodeQL toolchain itself will always be\nfound, but if you need to use separately distributed extractors you need\nto give this option (or, better yet, set up `--search-path` in a\nper-user configuration file).\n\n(Note: On Windows the path separator is `;`).\n\n### Options to configure how to call the GitHub API to auto-detect languages.\n\n#### `-a, --github-auth-stdin`\n\nAccept a GitHub Apps token or personal access token via standard input.\n\nThis overrides the GITHUB\\_TOKEN environment variable.\n\n#### `-g, --github-url=<url>`\n\nURL of the GitHub instance to use. If omitted, the CLI will attempt to\nautodetect this from the checkout path and if this is not possible\ndefault to <https://github.com/>\n\n### Options to configure the package manager.\n\n#### `--registries-auth-stdin`\n\nAuthenticate to GitHub Enterprise Server Container registries by passing\na comma-separated list of \\<registry\\_url>=\\<token> pairs.\n\nFor example, you can pass\n`https://containers.GHEHOSTNAME1/v2/=TOKEN1,https://containers.GHEHOSTNAME2/v2/=TOKEN2`\nto authenticate to two GitHub Enterprise Server instances.\n\nThis overrides the CODEQL\\_REGISTRIES\\_AUTH and GITHUB\\_TOKEN environment\nvariables. If you only need to authenticate to the github.com Container\nregistry, you can instead authenticate using the simpler\n`--github-auth-stdin` option.\n\n### Low-level dataset cleanup options\n\n#### `--max-disk-cache=<MB>`\n\nSet the maximum amount of space that the disk cache for intermediate\nquery results can use.\n\nIf this size is not configured explicitly, the evaluator will try to use\na \"reasonable\" amount of cache space, based on the size of the dataset\nand the complexity of the queries. Explicitly setting a higher limit\nthan this default usage will enable additional caching which can speed\nup later queries.\n\n#### `--min-disk-free=<MB>`\n\n\\[Advanced] Set target amount of free space on file system.\n\nIf `--max-disk-cache` is not given, the evaluator will try hard to\ncurtail disk cache usage if the free space on the file system drops\nbelow this value.\n\n#### `--min-disk-free-pct=<pct>`\n\n\\[Advanced] Set target fraction of free space on file system.\n\nIf `--max-disk-cache` is not given, the evaluator will try hard to\ncurtail disk cache usage if the free space on the file system drops\nbelow this percentage.\n\n#### `--cache-cleanup=<mode>`\n\nSelect how aggressively to trim the cache. Choices include:\n\n`clear`: Remove the entire cache, trimming down to the state of a\nfreshly extracted dataset\n\n`trim` *(default)*: Trim everything except explicitly \"cached\"\npredicates.\n\n`fit`: Simply make sure the defined size limits for the disk cache are\nobserved, deleting as many intermediates as necessary.\n\n`overlay`: Trim to just the data that will be useful when evaluating\nagainst an overlay.\n\n#### `--cleanup-upgrade-backups`\n\nDelete any backup directories resulting from database upgrades.\n\n### Tracing options\n\n#### `--no-tracing`\n\n\\[Advanced] Do not trace the specified command, instead rely on it to\nproduce all necessary data directly.\n\n#### `--extra-tracing-config=<tracing-config.lua>`\n\n\\[Advanced] The path to a tracer configuration file. It may be used to\nmodify the behavior of the build tracer. It may be used to pick out\ncompiler processes that run as part of the build command, and trigger\nthe execution of other tools. The extractors will provide default tracer\nconfiguration files that should work in most situations.\n\n### Build command customization options\n\n#### `--working-dir=<dir>`\n\n\\[Advanced] The directory in which the specified command should be\nexecuted. If this argument is not provided, the command is executed in\nthe value of `--source-root` passed to codeql database create, if one exists. If no `--source-root` argument is provided, the command is executed in the\ncurrent working directory.\n\n#### `--no-run-unnecessary-builds`\n\n\\[Advanced] Only run the specified build command(s) if a database under\nconstruction uses an extractor that depends on tracing a build process.\nIf this option is not given, the command will be executed even when\nCodeQL doesn't need it, on the assumption that you need its side\neffects for other reasons.\n\n### Options to control extractor behavior\n\n#### `-O, --extractor-option=<extractor-option-name=value>`\n\nSet options for CodeQL extractors. `extractor-option-name` should be of\nthe form extractor\\_name.group1.group2.option\\_name or\ngroup1.group2.option\\_name. If `extractor_option_name` starts with an\nextractor name, the indicated extractor must declare the option\ngroup1.group2.option\\_name. Otherwise, any extractor that declares the\noption group1.group2.option\\_name will have the option set. `value` can\nbe any string that does not contain a newline.\n\nYou can use this command-line option repeatedly to set multiple\nextractor options. If you provide multiple values for the same extractor\noption, the behavior depends on the type that the extractor option\nexpects. String options will use the last value provided. Array options\nwill use all the values provided, in order. Extractor options specified\nusing this command-line option are processed after extractor options\ngiven via `--extractor-options-file`.\n\nWhen passed to [codeql database init](/en/code-security/reference/code-scanning/codeql/codeql-cli-manual/database-init) or `codeql database begin-tracing`, the options will only be\napplied to the indirect tracing environment. If your workflow also makes\ncalls to\n[codeql database trace-command](/en/code-security/reference/code-scanning/codeql/codeql-cli-manual/database-trace-command) then the options also need to be passed there if desired.\n\nSee <https://codeql.github.com/docs/codeql-cli/extractor-options> for\nmore information on CodeQL extractor options, including how to list the\noptions declared by each extractor.\n\n#### `--extractor-options-file=<extractor-options-bundle-file>`\n\nSpecify extractor option bundle files. An extractor option bundle file\nis a JSON file (extension `.json`) or YAML file (extension `.yaml` or\n`.yml`) that sets extractor options. The file must have the top-level\nmap key 'extractor' and, under it, extractor names as second-level map\nkeys. Further levels of maps represent nested extractor groups, and\nstring and array options are map entries with string and array values.\n\nExtractor option bundle files are read in the order they are specified.\nIf different extractor option bundle files specify the same extractor\noption, the behavior depends on the type that the extractor option\nexpects. String options will use the last value provided. Array options\nwill use all the values provided, in order. Extractor options specified\nusing this command-line option are processed before extractor options\ngiven via `--extractor-option`.\n\nWhen passed to [codeql database init](/en/code-security/reference/code-scanning/codeql/codeql-cli-manual/database-init) or `codeql database begin-tracing`, the options will only be\napplied to the indirect tracing environment. If your workflow also makes\ncalls to\n[codeql database trace-command](/en/code-security/reference/code-scanning/codeql/codeql-cli-manual/database-trace-command) then the options also need to be passed there if desired.\n\nSee <https://codeql.github.com/docs/codeql-cli/extractor-options> for\nmore information on CodeQL extractor options, including how to list the\noptions declared by each extractor.\n\n### Common options\n\n#### `-h, --help`\n\nShow this help text.\n\n#### `-J=<opt>`\n\n\\[Advanced] Give option to the JVM running the command.\n\n(Beware that options containing spaces will not be handled correctly.)\n\n#### `-v, --verbose`\n\nIncrementally increase the number of progress messages printed.\n\n#### `-q, --quiet`\n\nIncrementally decrease the number of progress messages printed.\n\n#### `--verbosity=<level>`\n\n\\[Advanced] Explicitly set the verbosity level to one of errors,\nwarnings, progress, progress+, progress++, progress+++. Overrides `-v`\nand `-q`.\n\n#### `--logdir=<dir>`\n\n\\[Advanced] Write detailed logs to one or more files in the given\ndirectory, with generated names that include timestamps and the name of\nthe running subcommand.\n\n(To write a log file with a name you have full control over, instead\ngive `--log-to-stderr` and redirect stderr as desired.)\n\n#### `--common-caches=<dir>`\n\n\\[Advanced] Controls the location of cached data on disk that will\npersist between several runs of the CLI, such as downloaded QL packs and\ncompiled query plans. If not set explicitly, this defaults to a\ndirectory named `.codeql` in the user's home directory; it will be\ncreated if it doesn't already exist.\n\nAvailable since `v2.15.2`."}