Filter Variants With Databases
The Filter variants with databases task is a variants filtering task that enables you to filter variants against built-in databases, to clean up noise from common polymorphisms and benign mutations. The available built-in databases are:
Primate AI [1]: Prediction of pathogenicity on missense mutations. Percentile score ranges from 0 to 1, with 0 being benign, 1 being most pathogenic. Global percentile is based on global ranking of all missense variants in the human genome, while Percentile is based on ranking of all missense variants within the same gene.
Promoter AI [2]: Prediction of expression-altering consequences of variants at promoter regions. Scores range from -1 to 1. Positive score indicates variant likely enhances transcription, negative score indicates variant likely repress transcription.
gnomAD [3]: The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. Filtering by minor allele frequency (MAF) is supported.
DRAGEN Haplotype Database: Proprietary haplotype database built from a panel of 256 population haplotypes. Filtering by minor allele frequency (MAF) is supported.
The databases in the Filter variants with databases task support only human hg38 assembly.
Running Filter Variants With Databases
The Filter variants with databases can be invoked from a Variants data node:
Click to select a Variants data node.
Select Variant analysis from context-sensitive menu, select Filter variants with databases.
In the task set up page,
Select whether to filter out or filter in your input variants based on their presence in a reference database.
Choose a reference database for filtering.
Define criteria to include or exclude variants from the selected reference database. This filtered list will be used to filter your input variants.
Optionally, click AND to add more reference databases for filtering. When multiple reference databases are used, the filtering with the subsets of the reference databases are applied sequentially in the defined order.
Click Finish.

Filter Variants With Databases Report
When the task is completed, double-click Variants node opens a filter summary report that consists of a table where each row is a sample, columns are filter summary statistics, indicating number of variants passed/failed the filter, and the percentage of variants passed the filter.

References
Hong Gao et al. ,The landscape of tolerated genetic variation in humans and primates.Science380, eabn8153(2023). DOI:10.1126/science.abn8197
Kishore Jaganathan et al. ,Predicting expression-altering promoter mutations with deep learning. Science389, eads7373(2025). DOI:10.1126/science.ads7373
Karczewski, K.J., Francioli, L.C., Tiao, G. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). https://doi.org/10.1038/s41586-020-2308-7. https://gnomad.broadinstitute.org/
Last updated
Was this helpful?
