1 of 5

Biological interpretation

Connected Multiomics offers biological interpretation tools that can provide additional insight into lists of genes, such as significantly different genes between experimental groups.

Gene Set Enrichment
GSEA

Gene Set Enrichment

What is Gene set enrichment?

Enrichment analysis is a technique commonly used to add biological context to a list of genes, such as list of significant genes filtered from differential analysis report. The procedure is based on assigning genes to groups and then finding overrepresented groups in filtered gene lists using a Fisher's exact test.

GSEA

GSEA is a bioinformatics tool that determines whether a set of genes (e.g. a gene ontology (GO) group or a pathway) shows statistically significant, concordant differences between two experimental groups (1,2). Briefly, the goal of GSEA is to determine whether the genes belonging to a gene set are randomly distributed throughout the ranked (by expression) list of all the genes that should be taken into consideration (e.g. gene model), or are primarily found at the top or at the bottom of the list.

Prerequisites

To run GSEA, your project has to contain at least one categorical factor with at least two levels (e.g. Treated and Control). If you are running GSEA on RNA-seq data, note that some common normalization transformations, such as fragments/reads per kilobase of transcript per million mapped reads (FPKM/RPKM) or transcripts per million (TPM) are not considered suitable for GSEA. Instead, you should use an approach such as DESeq2 normalisation, trimmed means of M (TMM), or geometric mean.

Running GSEA

To launch GSEA, select the data node with normalised data and then go to Biological interpretation > GSEA

Use the first dialog to specify gene sets. You can run GSEA on pathways (currently based on Kyoto Encyclopedia of Genes and Genomes () pathways) or on other gene set databases. When using the KEGG option, the KEGG database (i.e. the species) is automatically set, based on the upstream nodes. The Gene set size option allows you to restrict your analysis on gene sets of certain size (i.e. number of genes).

Once your choices are made, push Next to proceed.

In the second part of the set up, pick the experimental factor for GSEA.

GSEA task only compute one factor at a time. If you select more than one factors, the computation will be performed on each one individually. Click Next to setup comparisons:

The box on the left side displays the categories of the selected factor (shown as Factor). Use the arrow buttons (>) to move one of the factors to the Denominator box (that factor should be interpreted as the reference category) and the other factor to the Numerator box. Confirm your selection by pushing the Add comparison button and the comparison will be added to the Comparisons table.

Low value filter is turned on by default and will remove all the genes with the lowest average coverage of 1.0 or below; if a filter feature task was performed before this task, the default low-value filter is set to None.

Push Finish to launch GSEA with the default settings. Each comparison will be performed individually and generate its own section in the report.

Click on the Configure icon to access the advanced options.

Number of data permutations (needed to calculate the normalised enrichment scores) can be controlled using the Permutations option. Permutation is to randomly permute the group assignment across a given gene. For each permutation, a random order is computed, that order is used to compute the score for each gene. Finally, make sure the input data is in log scale or not.

GSEA Results

When the task completes, double click on the GSEA task node to view the report.

Like report, the report consists of two parts: the GSEA result table on the right and the filter panel on the left

The comparison (i.e. Denominator vs. Numerator) is given at the top of the GSEA table. Each row of the table corresponds to one gene set (pathway) and the gene sets are ranked by the first comparison's normalized enrichment score in descending order.

View. The icons in the View column open the enrichment plot () or the extra details report () (explanations below).
Gene set ID. The Gene set IDs are based on the gene set file that was selected during set up. Each ID is a link to the details of he selected set.
Gene set size. Number of genes in the set (as specified in the gene set file), click on the number to download the list of genes.

Click on the View enrichment report icon () to open a new Data viewer session with the per gene set report. The selected gene set is in the title, at the top of the canvas (Enrichment profile). To quickly switch to another gene set, use the Axis > Content drop-down list. The individual plots are as follows:

Enrichment score. The algorithm walks down the ranked list of all the genes in the model, increasing the running sum (y axis) each time when a gene in the current gene set is encountered. Conversely, the running-sum is decreased each time a gene not in the current gene set is encountered. The magnitude of the increment depends on the correlation of the gene with the experimental factor. The enrichment score is then the maximum deviation from zero encountered in the random walk (the summit of the curve).
Gene set hits. Each vertical line shows the location of a gene from the current gene set, within the ranked list of all the genes in the model.
Rank metric. The plot shows the value of the ranking metric (y axis) as you move down the ranked list of all the genes in the model (x axis). The ranking metric measures a gene’s correlation with the attribute specified in the comparison.

Click on the View extra details plot () to open a gene set-specific report page

Leading edge genes: it is a subset of genes that contribute most to the ES. For a positive ES, the leading edge subset is the set of members that appear in the ranked list prior to the peak score. For a negative ES, it is the set of genes that appear subsequent to the peak score.

The filter panel is used to narrow the list of gene sets. The Results shows the number of gene sets currently in the table. Filtering can be performed on: Gene set ID (search for the numeric ID), Gene set description (search for a key word), Gene set size (number of genes in the set), Enrichment score, Normalised enrichment score, P-value, FDR. Click on the black triangle to open the controls for each filter. To remove all the filters, click on the Clear filter link.

Click Generate filtered node button to perform the filter task based on the specified criteria.

References

Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550. doi:10.1073/pnas.0506580102
Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267-273. doi:10.1038/ng1180

Gene set ANOVA

Gene set ANOVA allows user to perform 1-way ANOVA to compare different groups at gene set level. This method takes normalized gene expression count matrix, a gene set is a group of genes based on database specified, like GO term, KEGG pathway etc.

Like setup ANOVA model for gene expression analysis, but only one factor can be added to the model. In addition, the following extra terms will be added to the model by the task automatically:

Gene ID - Since not all genes in a functional group express at the same level, gene ID is added to the model to account for gene-to-gene differences
Factor * Gene ID - Interaction of gene ID with the factor is added to detect changes within the expression of a gene set with respect to different levels of the factor, referred to disruption. For instance, in a gene set, maybe some genes showing up-regulation in treatment group, but some other genes showing down-regulation in the treatment group, we call it gene set disruption.

Running Gene set ANOVA

Select the data node with normalized data and then go to Biological interpretation > Gene set ANOVA

Use the first dialog to specify gene sets database. You can rung gene set ANOVA on pathways (currently based on Kyoto Encyclopedia of Genes and Genomes () pathways) or on other gene set databases. The Gene set size option allows you to restrict your analysis on gene sets of certain size (i.e. number of genes). Make sure the feature identifier in the data contains gene symobl/gene name, which is used to map to the database. Click Next.

Once your choices are made, push Next to proceed.

In the second part of the set up, pick the experimental factor, only one factor can be selected.

Click Next to setup comparisons:

Click Finish to run. Each comparison will be performed individually and generate its own section in the report.

Click on the Configure icon to access the advanced options.

Gene Set ANOVA Results

When the task completes, double click on the Gene Set ANOVA node to view the report.

Like report, the report consists of two parts: the GSEA result table on the right and the filter panel on the left.

The comparison (i.e. Denominator vs. Numerator) is given at the top of the table. Each row of the table corresponds to one gene set (pathway) and the gene sets are ranked by the first comparison's p-value in ascending order.

View. The icons in the View column open the dot plot () or the extra details report () (explanations below).
Gene set ID. The Gene set IDs are based on the gene set file that was selected during set up. Each ID is a link to the details of he selected set.
Gene set size. Number of genes in the set (as specified in the gene set file), click on the number to download the list of genes.

Click on the dot plot icon to open the viewer

The plot display the genes of the gene set selected. X-axis represents genes within the gene set, Y-axis represents the mean value of gene expression, each dot represent of the group in the comparison.

Click on the View extra details icon () to open a gene set-specific report page, the model used for the computation is included in this report.

Correlation Engine Pathway

What is Correlation Engine pathway analysis?

Correlation Engine (CE) Pathway analysis helps determine if your gene(s) of interest, identified by differential analysis in Connected Multiomics, corresponds to gene or protein sets from the GO consortium, MSigDB, TargetScan, and InterPro.

Running Correlation Engine pathway

Correlation Engine pathway task can be invoked on a differential analysis output data node. The filtered differential analysis output is recommended because it includes the genes of interest between comparisons.

Click a filtered feature list data node
Click the Biological interpretation section of the toolbox
Click Correlation Engine pathway

Select at least one or more contrasts of interest for Correlation Engine pathway analysis

Click Finish to run

The result is stored under a Correlation engine node. To open it, double click on the node or select the respective Task report from the context sensitive menu.

Task report

Use the dropdown list to switch between different contrasts. For each contrast, the report is a table with one pathway per row (Gene set column; the column entries are clickable for hyperlinks), with the category name in the Title column. The Taxonomy column tells the database sources, while the Description column provides more information about the pathway.

Illumina has developed the Running Fisher algorithm to perform pathway analysis in CE. See more details about the calculation of Direction, Normalized enrichment score, Enrichment score, and P-value in our technical note:

Visualizing Correlation Engine pathway results

Only if the report table has fewer than 100 pathways (rows), can they be visualized in the Data Viewer.

To make it easier to visualize, ICM includes the “Open Data Viewer auto session” link. By clicking it, a Data Viewer session will open with the top 30 pathways ranked by Normalized enrichment score for the contrast.

Two plots are loaded into the Data Viewer. Both plots show Normalized enrichment score on the horizontal axis and pathways (i.e. the ones present in the gene enrichment table) on the vertical axis. The plots show Normalized enrichments scores (Normalized enrichment score column of the task report table) and - in addition - the plot on the left uses color range to depict enrichment directions (blue = Up, red = Down).

To customize the content plotted, filter down the number of results. Type the value in the text box in the column header and hit enter (an example using a cut-off based on the Normalize enrichment score is shown below). Once the number of results falls below 100, the View plots in Data Viewer icon (“Open Data Viewer custom session”) will be displayed. Click the link to open a new Data Viewer session.

Gene set ANOVA

Like setup ANOVA model for gene expression analysis, but only one factor can be added to the model. In addition, the following extra terms will be added to the model by the task automatically:

Gene ID - Since not all genes in a functional group express at the same level, gene ID is added to the model to account for gene-to-gene differences
Factor * Gene ID - Interaction of gene ID with the factor is added to detect changes within the expression of a gene set with respect to different levels of the factor, referred to disruption. For instance, in a gene set, maybe some genes showing up-regulation in treatment group, but some other genes showing down-regulation in the treatment group, we call it gene set disruption.

Running Gene set ANOVA

Select the data node with normalized data and then go to Biological interpretation > Gene set ANOVA

Once your choices are made, push Next to proceed.

In the second part of the set up, pick the experimental factor, only one factor can be selected.

Click Next to setup comparisons:

Click Finish to run. Each comparison will be performed individually and generate its own section in the report.

Click on the Configure icon to access the advanced options.

Gene Set ANOVA Results

When the task completes, double click on the Gene Set ANOVA node to view the report.

Like report, the report consists of two parts: the GSEA result table on the right and the filter panel on the left.

View. The icons in the View column open the dot plot () or the extra details report () (explanations below).
Gene set ID. The Gene set IDs are based on the gene set file that was selected during set up. Each ID is a link to the details of he selected set.
Gene set size. Number of genes in the set (as specified in the gene set file), click on the number to download the list of genes.

Click on the dot plot icon to open the viewer

Click on the View extra details icon () to open a gene set-specific report page, the model used for the computation is included in this report.

Correlation Engine Pathway

What is Correlation Engine pathway analysis?

Running Correlation Engine pathway

Click a filtered feature list data node
Click the Biological interpretation section of the toolbox
Click Correlation Engine pathway

Select at least one or more contrasts of interest for Correlation Engine pathway analysis

Click Finish to run

The result is stored under a Correlation engine node. To open it, double click on the node or select the respective Task report from the context sensitive menu.

Task report

Visualizing Correlation Engine pathway results

Only if the report table has fewer than 100 pathways (rows), can they be visualized in the Data Viewer.

GSEA

Prerequisites

Running GSEA

To launch GSEA, select the data node with normalised data and then go to Biological interpretation > GSEA

Once your choices are made, push Next to proceed.

In the second part of the set up, pick the experimental factor for GSEA.

GSEA task only compute one factor at a time. If you select more than one factors, the computation will be performed on each one individually. Click Next to setup comparisons:

Push Finish to launch GSEA with the default settings. Each comparison will be performed individually and generate its own section in the report.

Click on the Configure icon to access the advanced options.

GSEA Results

When the task completes, double click on the GSEA task node to view the report.

Like report, the report consists of two parts: the GSEA result table on the right and the filter panel on the left

View. The icons in the View column open the enrichment plot () or the extra details report () (explanations below).
Gene set ID. The Gene set IDs are based on the gene set file that was selected during set up. Each ID is a link to the details of he selected set.
Gene set size. Number of genes in the set (as specified in the gene set file), click on the number to download the list of genes.

Enrichment score. The algorithm walks down the ranked list of all the genes in the model, increasing the running sum (y axis) each time when a gene in the current gene set is encountered. Conversely, the running-sum is decreased each time a gene not in the current gene set is encountered. The magnitude of the increment depends on the correlation of the gene with the experimental factor. The enrichment score is then the maximum deviation from zero encountered in the random walk (the summit of the curve).
Gene set hits. Each vertical line shows the location of a gene from the current gene set, within the ranked list of all the genes in the model.
Rank metric. The plot shows the value of the ranking metric (y axis) as you move down the ranked list of all the genes in the model (x axis). The ranking metric measures a gene’s correlation with the attribute specified in the comparison.

Click on the View extra details plot () to open a gene set-specific report page

Click Generate filtered node button to perform the filter task based on the specified criteria.

References

Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545-15550. doi:10.1073/pnas.0506580102
Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34(3):267-273. doi:10.1038/ng1180

Biological interpretation

Gene Set Enrichment

What is Gene set enrichment?

GSEA

Prerequisites

Running GSEA

GSEA Results

References

Gene set ANOVA

Running Gene set ANOVA

Gene Set ANOVA Results

Correlation Engine Pathway

What is Correlation Engine pathway analysis?

Running Correlation Engine pathway

Task report

Visualizing Correlation Engine pathway results

Gene set ANOVA

Running Gene set ANOVA

Gene Set ANOVA Results

Correlation Engine Pathway

What is Correlation Engine pathway analysis?

Running Correlation Engine pathway

Task report

Visualizing Correlation Engine pathway results

Biological interpretation

Gene Set Enrichment

What is Gene set enrichment?

Task report

Interactive KEGG pathway maps

Coloring the map

Feature details

Visualizing gene set enrichment results

References

GSEA

Prerequisites

Running GSEA

GSEA Results

References

Biological interpretation

Gene Set Enrichment

hashtagWhat is Gene set enrichment?

hashtag

GSEA

hashtagPrerequisites

hashtagRunning GSEA

hashtagGSEA Results

hashtagReferences

Gene set ANOVA

hashtagRunning Gene set ANOVA

hashtagGene Set ANOVA Results

hashtag

Correlation Engine Pathway

hashtagWhat is Correlation Engine pathway analysis?

hashtagRunning Correlation Engine pathway

hashtagTask report

hashtagVisualizing Correlation Engine pathway results

Gene set ANOVA

hashtagRunning Gene set ANOVA

hashtagGene Set ANOVA Results

hashtag

Correlation Engine Pathway

hashtagWhat is Correlation Engine pathway analysis?

hashtagRunning Correlation Engine pathway

hashtagTask report

hashtagVisualizing Correlation Engine pathway results

Biological interpretation

Gene Set Enrichment

hashtagWhat is Gene set enrichment?

hashtag

hashtagTask report

hashtagInteractive KEGG pathway maps

hashtagColoring the map

hashtagFeature details

hashtagVisualizing gene set enrichment results

hashtagReferences

GSEA

hashtagPrerequisites

hashtagRunning GSEA

hashtagGSEA Results

hashtagReferences

What is Gene set enrichment?

Prerequisites

Running GSEA

GSEA Results

References

Running Gene set ANOVA

Gene Set ANOVA Results

What is Correlation Engine pathway analysis?

Running Correlation Engine pathway

Task report

Visualizing Correlation Engine pathway results

Running Gene set ANOVA

Gene Set ANOVA Results

What is Correlation Engine pathway analysis?

Running Correlation Engine pathway

Task report

Visualizing Correlation Engine pathway results

What is Gene set enrichment?

Task report

Interactive KEGG pathway maps

Coloring the map

Feature details

Visualizing gene set enrichment results

References

Prerequisites

Running GSEA

GSEA Results

References