5-base DNA
Getting Started
Creating a Study from a ICA Project
Viewing Results and Navigating in ICM
Demo Data
Demo data that can be used to follow along with this walkthrough is found in the Connected Multiomics Demo Data repository. The dataset can be found at /Multiomics-Demo-Data/Methylation/Illumina 5-base-solution. This demo data consists of 6 samples, from two pheonotype groups. In this walkthrough, we outline analysis steps that can be performed to explore the data, identify differentially methylated regions between the two sample groups, and find pathways overrepresented in the differential test result.
5 files per sample are required to analyze the DNA Methylation Prep data in the Connected Multiomics software. Add the following 5 files for each sample from the demo data folder to a study prior to starting an analysis:
<sample name>.CX_report.txt.gz
<sample name>.methyl_metrics.csv
<sample name>.mapping_metrics.csv
<sample name>.wgs_coverage_metrics.csv
<sample name>.M-bias.txt
These files are generated from DRAGEN analysis. CX_report file is the key output file that contains methylation reads count at single nucleotide level. The metrics files and the M-bias file contain QC metrics for reads mapping quality and methylation calling, which will be used to generate visualizations in 5-base Methylation QC task in the Connected Multiomics.
Custom 5-base Methylation Analysis

Creating a Custom Analysis
After all 6 samples are added into a study, follow these steps to create a Custom analysis in the Connected Multiomics:
Click on + New Analysis.
In the pop-up window, provide a name for the analysis, select Custom: 5-base Methylation as the Analysis Type, choose a sample group to be included in the analysis (all samples option is selected by default), and click on the Run Analysis button.

Refresh the page to get the latest status of the analysis.
When the Status is Complete, it indicates that launching the analysis has started, click on the analysis tile to enter the analysis module. You will see an ongoing Import cohort task that is importing the data into this analysis. After the Import cohort task is completed, the first data node called 5-base Methylation is generated.
To review the number of samples and features, hover over the 5-base Methylation data node. Features refer to CpG sites.

The 5-base Methylation data node contains raw methylated counts and unmethylated counts for CpG sites present in the CX reports. For sites with methylation calls on both strands in the CX report, the strands are collapsed such that poisiton on the postive strand is used and the methylation counts are summed. This data node also contains percent methylation levels which will be used in exploratory analysis such as Principal Component Analysis (PCA).
Add Sample Metadata
We use Metadata tab within an analysis to manage sample metadata. Follow these steps to create a new sample attribute called sampleGroup, and assign attribute value to each sample:
Click on Metadata tab. In Sample attributes menu on the left, click Manage.
In the Manage sample attributes page, click Add new attribute. Type in sampleGroup in the Name text box, click Add.

Click + button to add two category values A, B to the sampleGroup attribute.

Click Back to metadata tab.
Click Assign values under Sample attributes. Use dropdown at each sample to assign a category value for the sampleGroup attribute. Assign value for each sample as screenshot below.

Click Apply changes to save the assigned values.
5-base Methylation QC
The 5-base methylation QC task in the Connected Multiomics enables you to visualize sample-level QC metrics that describe reads mapping quality and CpG methylation calling. The QC metrics are extracted from the DRAGEN analysis metric files that were ingested into the study. To invoke the 5-base methylation QC task:
At Analyses page, click on the 5-base Methylation node.
Click QA/QC section in the context-sensitive task menu on the right.
Click 5-base methylation QC.
After the 5-base methylation QC task is completed, double-click on the task node to open the QC report in a data viewer. The QC report consists of plots and tables organized in 2 sheets. Click sheet name at the bottom of the data viewer to navigate from one sheet to another.

Sheet Metrics shows sample-level QC metrics plot. Each sample is a data point, they are randomly spead out on x-axis. The QC metric is represented by y-axis. Each plot is overlay with a violin plot to show distribution of the QC metrics.
Percent methylation in samples: Percentages of CpG methylation in samples.
Percent methylation in unmethylated control: Percentage of CpG methylation in the unmethylated control (lambda). Low value indicates good quality.
Percent methylation in methylated control: Percentage of CpG methylation in the methylated control (pUC19). High value indicates good quality.
Percent duplicate reads: Percentage of duplicate marked reads, as a result of PCR amplification.
Percent mapped reads: Percentage of mapped reads, indicate the alignment rate.
Average autosomal coverage: Mean autosomal coverage across the whole genome. Higher coverage indicates the counts of methylated/unmethylated more accurately reflects the true methylation amount at any particular site.
QC metrics table: Text representations of the QC metrics plots.

Sheet M-bias shows M-bias plots for methylation level and coverage across positions on read1 and read2. The M-bias should be consistent across all positions. It is common for the first/last 10 bases to have un-even methylation due to end-repair and sequencing artifacts.
PCA
The principal components analysis (PCA) scatter plot allows us to visualize similarities and differences between the samples in a dataset. To invoke a PCA task:
Click on the 5-base Methylation node.
Click Exploratory analysis section in the context-sensitive task menu.
Click PCA.
Set to use the top 100,000 features with the highest variance in calculation.
Keep the rest of the parameters as default, and click Finish.

After the PCA task is completed, double click on the PCA node to view the PCA plot in a data viewer.

The scatter plot shows the data distribution among the first three PCs. Each sample is a data point.
The scree plot (top right panel) shows variance represented by each PC.
The component loading table (bottom right panel) shows the correlation between CpG methylation sites and PC.
For additional information on PCA, refer to the PCA documentation.
Detect Differentially Methylated Regions (DMRs)
DSS (Dispersion Shrinkage for Sequencing data) enables the detection of differentially methylation regions using counts data at single nucleotide level. It uses beta-binomial distribution to model methylation counts at each CpG site and uses Wald test to identify differentially methylation loci (DML). Nearby DMLs are then merged into a region to form differentially methylated region (DMR). Set up a DSS task to identify DMRs between two sample groups:
Click on the 5-base Methylation node.
Click Statistics section in the context-sensitive task menu.
Click Differential Methylation.
Select DSS as the Method to use for differential methylation analysis, click Next.
Select sampleGroup as factor for analysis, click Next.

Drag A to the top right Numerator box and B to the bottom right Denominator box. Click on Add comparison.
Keep the rest of the settings as default, then click Finish.

When the DSS task is completed, double click on the A vs B (DMR) node to open the DMR report. The DSS DMR task report lists regions on rows and the test statistics (areaStat, diff.Methy, etc.) on columns. Regions are listed in descending order by the abs(areaStat) so that the most significant DMR is listed first. diff.Methy statistics reports the difference in average methylation between the two groups, negative value indicates A is hypomethylated compared to B in the region, while positive value indicates A is hypermethylated compared to B in the region. Refer to DSS documentation to learn more about the differential methylation report.
On the DMR report, click on the volcano icon ( ) next to the comparison name to open a differential methylation plot in a Data Viewer. Each data point in the plot is a region. The plot can be colored based on user-defined hypo- and hypermethylation thresholds:
Click anywhere within the plot canvas on the top panel to select the plot.
Click Configure icon on the left, click Style. In the Style dialog, set Color by option to Significance.
Click Configure icon on the left, click Statistics. In the Statistics dialog, set X threshold to -0.2 and 0.2. Drag Y threshold sliding bar to maximum.
The regions are now colored in the volcano plot. Hypomethylated regions (diff.Methy < -0.2) are colored in blue, hypermethylated regions (diff.Methy > 0.2) are colored in red.

Filter DMRs
We recommend filtering DMRs by hypo- or hypermethylation status, using the diff.Methy statistics, to give the necessary context of which pathways are hypo- or hypermethylated from the differential comparison. To filter DMR results to hypermethylated DMRs,
Click A vs B (DMR) node.
Click Filtering section in the context-sensitive task menu.
Click Differential analysis filter.
Choose Metadata as Filter type.
In Filter criteria section, set Filter features by include A vs B: diff.Methy > 0.2, then click Finish.

This generates a Filtered features list node that contains DMRs passing the filtering criteria. Same steps can be applied to generate a filtered list of hypomethylated DMRs, by setting filtering criteria to include regions with diff.Methy statistics < -0.2. The filtering threshold can be adjusted, more filtering criteria can be defined, based on your research questions.
Annotate DMRs
Next, we are going to annotate the filtered DMRs list with genes information using an annotation model.
Click Filtered feature list node.
Click Region analysis section in the context-sensitive task menu.
Click Annotate regions.
Assembly for this demo dataset should be Homo sapiens (human) - hg38, choose GENCODE Genes - release 44 as Annotation model, keep the remaining settings as default, click Finish.

When completed, double click Annotated regions node to open the annotation report. The annotation report shows a pie chart on gene section breakdown for the DMRs, and a table where each row is a DMR, columns are the annotated gene information.
Click Optional columns on the top right of the table, tick gene_name checkbox to display gene name in the table.

Gene Set Enrichment
Gene set enrichment analysis identifies gene sets and pathways that are over-represented in a list of significant genes, providing clues to the biological meaning of your results.
Click Annotated regions node.
Click Biological interpretation section in the context-sensitive task menu.
Click Gene set enrichment.
Select KEGG database as Database for pathway enrichment analysis. Choose Homo sapiens hsa_v12_25_04_07 from the KEGG database dropdown.
At Feature identifier section, tick Select feature identifier checkbox, select gene_name, then click Finish.

When completed, double click Pathway enrichment node to open the pathway enrichment report. Each row in the report is a pathway, with an enrichment score and p-value. It also lists how many genes in the pathway were in the input gene list and how many were not. Click on the pathway ID in the first column to view the pathway diagram. On the pathway diagram, click on a gene name links to KEGG page for additional details.


Last updated
Was this helpful?