> For the complete documentation index, see [llms.txt](https://help.multiomics.illumina.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://help.multiomics.illumina.com/icm/analyses/analysis-functionality/task-menu/variant-analysis/filter-variants-with-databases.md).

# Filter Variants With Databases

The Filter variants with databases task is a variants filtering task that enables you to filter variants against built-in databases, to clean up noise from common polymorphisms and benign mutations. The available built-in databases are:

* **Primate AI** \[1]: Prediction of pathogenicity on missense mutations. Percentile score ranges from 0 to 1, with 0 being benign, 1 being most pathogenic. *Global percentile* is based on global ranking of all missense variants in the human genome, while *Percentile* is based on ranking of all missense variants within the same gene.
* **Promoter AI** \[2]: Prediction of expression-altering consequences of variants at promoter regions. *Scores* range from -1 to 1. Positive score indicates variant likely enhances transcription, negative score indicates variant likely repress transcription.
* **gnomAD** \[3]: The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. Filtering by *minor allele frequency (MAF)* is supported.
* **DRAGEN Haplotype Database**: Proprietary haplotype database built from a panel of 256 population haplotypes. Filtering by *minor allele frequency (MAF)* is supported.

{% hint style="warning" %}
The databases in the Filter variants with databases task support only human hg38 assembly.
{% endhint %}

## Running Filter Variants With Databases

The Filter variants with databases can be invoked from a Variants data node:

* Click to select a **Variants** data node.
* Select **Variant analysis** from context-sensitive menu, select **Filter variants with databases.**
* In the task set up page,
  * Select whether to **filter out** or **filter in** your input variants based on their presence in a reference database.
  * Choose a **reference database** for filtering.
  * **Define criteria** to include or exclude variants from the selected reference database. This filtered list will be used to filter your input variants.
  * Optionally, click **AND** to add more reference databases for filtering. When multiple reference databases are used, the filtering with the subsets of the reference databases are applied sequentially in the defined order.
  * Click **Finish**.

<figure><img src="/files/ClFkT03hAb8pQwPOcxrP" alt=""><figcaption></figcaption></figure>

## Filter Variants With Databases Report

When the task is completed, double-click **Variants** node opens a filter summary report that consists of a table where each row is a sample, columns are filter summary statistics, indicating number of variants passed/failed the filter, and the percentage of variants passed the filter.

<figure><img src="/files/sIPZEX2C0wBIxMQSYGEQ" alt=""><figcaption></figcaption></figure>

## References

1. Hong Gao et al. ,The landscape of tolerated genetic variation in humans and primates.Science380, eabn8153(2023). DOI:10.1126/science.abn8197
2. Kishore Jaganathan et al. ,Predicting expression-altering promoter mutations with deep learning. Science389, eads7373(2025). DOI:10.1126/science.ads7373
3. Karczewski, K.J., Francioli, L.C., Tiao, G. *et al.* The mutational constraint spectrum quantified from variation in 141,456 humans. *Nature* 581, 434–443 (2020). <https://doi.org/10.1038/s41586-020-2308-7>. <https://gnomad.broadinstitute.org/>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.multiomics.illumina.com/icm/analyses/analysis-functionality/task-menu/variant-analysis/filter-variants-with-databases.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
