Library Files

The library files associated with the selected assembly are organized into several sections. Below is some information on each section.

Reference Files

This section includes two types of library file: reference sequence and cytoband files.

Reference sequences are the chromosome/scaffold/contig DNA sequences for a species. A reference sequence file is typically in FASTA or 2bit format. The reference sequence of a species is used for aligner index creation, variant detection and visualization of the reference sequence in the Chromosome view.

Cytoband files are used for drawing ideograms of chromosomes in the Chromosome view, including positions of cytogenetic bands if known.

Gene sets

Gene set files are required for biological interpretation analyses (e.g. GO enrichment). Genes are grouped together according to their biological function. Gene set files have to be in GMT format, where each row represents one gene set. The first column of a GMT file is the GO ID or gene set name. The second column is an optional text description. Subsequent columns are the gene symbols that belong to each gene set. Gene ontologies for various model organisms are available for automatic download from the ICM repository (source: geneontology.org). Because gene ontologies are frequently updated, geneontology.org is checked for updates quarterly. You can check for recent updates to the repository here.

Variant annotations

Variant annotation databases are collections of known genomic variants (e.g. single nucleotide polymorphisms). If you have performed a variant detection study, detected variants can be searched against variant annotation library files to see if the detected variants are known from previous studies. Furthermore, you can validate detected variants against 'gold-standard' variant annotation library files. Variant annotation files are typically in VCF format.

Variant annotation databases from commonly used sources (e.g. dbSNP) are available for automatic download from the Connected Multiomics repository. Because variant annotation databases are frequently updated, these sources are checked for updates quarterly. You can check for recent updates to the repository.

SnpEff variant databases

SnpEff¹ is a variant annotation and effect prediction tool that requires its own variant annotation files, separate to the other Variant annotation library files. If you wish to use SnpEff, library files need to be added to this section.

VEP database

The Ensembl Variant Effect Predictor (VEP) is another variant annotation and prediction tool that requires its own annotation files, separate to the Variant annotation library files. If you wish to use VEP, library files need to be added to this section.

Annotation models

Annotation models describe genomic features (e.g. genes, transcripts, microRNAs) for a specific version of the reference sequence. Annotation models contain labels (e.g. gene ID) and genomic coordinates (e.g. chromosome, start & stop position) for each feature.

Annotation models will appear in separate tables (Figure 1). If you have multiple versions of annotation models from the same source, it is advisable to distinguish them by their date or version number.

Annotation models from commonly used sources (e.g. ENSEMBL) are available for automatic download from the ICM repository. Because annotation models are frequently updated, these sources are checked for updates quarterly. You can check for recent updates to the ICM repository.

Annotation models are used for quantification in gene expression analyses, annotating detected variants (e.g. to predict amino acid changes), visualizations in Chromosome view, generating coverage reports. Typical file formats include GTF, GFF, GFF3 and BED.

The arrows ( v /) next to the annotation model name expand/collapse each table. Two of the annotation models displayed in Figure 1 are different versions from the same source (Ensembl), distinguishable by their version number(release 105 vs 98).

microRNA targets maps

It is required for the Get microRNA targets task. Database from TargetScan for various model organisms are available for automatic download from the ICM repository (source: https://www.targetscan.org/vert_80/).

References

Cingolani P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 6(2):80-92. PMID: 2272867

PreviousSelecting an Assembly NextUpdate Library Index

Last updated 23 days ago

Was this helpful?

hashtagReference Files

hashtagGene sets

hashtagVariant annotations

hashtagSnpEff variant databases

hashtagVEP database

hashtagAnnotation models

hashtagmicroRNA targets maps

hashtagReferences