1 of 31

DRAGEN Protein Quantification

Get Started

Introduction

End to End (E2E) Illumina Protein Prep (IPP) workflow combines Illumina chemistry, SOMAmer technology, and DRAGEN data analysis for a comprehensive, automated NGS-based proteomics solution. This E2E solution provides the following:

NGS readout of more than 9.5K unique protein targets for a single plasma or serum sample.
From sample to processed results in under 2.5 days with just 4 hours of hands-on time.
Integrated analysis via (BSSH), (ICA), and (powered by ).
- Includes both local and cloud solutions for planning a run and for processing data with DRAGEN Protein Quantification.

End-to-End Overview

The E2E Illumina Protein Prep solution integrates automation steps to perform sample preparation, protein capture, sequencing, and bioinformatics analysis. Once sequencing is finished, the DRAGEN Protein Quantification application automatically initiates in BSSH or ICA. The diagram below illustrates the E2E workflow.

For documentation on the assay and automation components of Illumina Protein Prep, please refer to the Illumina Protein Prep Product Documentation (document # 200045446).

Versioning

Unless otherwise specified, this documentaiton covers DRAGEN Protein Quantification v2.2.2.

Prerequisites

Before setting up and running the Illumina Protein Prep End-to-End (E2E) solution, ensure that the necessary software, tools, and configurations are in place. These prerequisites can vary depending on the environment used to run the secondary analysis (e.g., via cloud or locally). Follow the steps below to configure instrument and software appropriately.

Instrument Software Prerequisites

Illumina Protein Prep Automation System Output Files

The Illumina Protein Prep Automation System Output File (IPPAS output file) is a .csv file that's produced after automated library prep is completed. It contains the following fields:

Column Header

Description

Local Software Install

Moving application install file to the server:

Via USB:

Run the following command:
1. cd /
Run the following command to identify which USB ports are in use:
1. lsblk -I 8 –d
Record the ports that are currently in use.
Run the following command to create the USB drive mount directory on the DRAGEN server:
1. mkdir /media/usb
Connect the USB drive with the DRAGEN Protein Quantification Pipeline installer to the front of the DRAGEN server.
Run the following command to confirm the USB drive name and details:
1. lsblk
INFO: The details include the name of the USB drive under the Name column (sda, sdb, sdc, or sdd). The partition name also displays under the drive name (for example, sdc1).
Compare the USB ports that display to the ports identified in step 4 and identify the new port that appears. This port is where the installation software is located.
Run the following command to mount the USB drive to the USB mount directory of the DRAGEN server:
1. mount /dev/<port> /media/usb/
2. For example: mount /dev/sdc1 /media/usb
Run the following command to find the SHA value in the installer file:
1. head -n25 /media/usb/install_DRAGEN_Protein_Quantification_v<version>.run | grep '^SHA'
Review the following table for SHA values.
SW Version
SHA
1. WARNING: If the SHA values do not match, stop the installation and contact Illumina Technical Support.
Run the following command to make sure that the USB drive is mounted to the USB mount directory:
1. lsblk -I 8 –d
Run the following command to confirm that install_DRAGEN_Protein_Quantification_v<version>.run is in the USB drive mount directory:
1. ls /media/usb/
Run the following command to copy install_DRAGEN_Protein_Quantification_v <version>.run to the staging directory:
1. cp /media/usb/install_DRAGEN_Protein_Quantification_v<version>.run /staging/
Unmount USB from mount directory:
1. umount /dev/<usb partition name>

Via Cloud Download

Navigate to staging:
1. cd /staging/
Download the installer from its online location

Via Connected Server

Copy the installer from an attached server:
1. cp <external location of installer>/install_DRAGEN_Protein_Quantification_v <version>.run /staging/

Installation of application:

Run the following command to change directories to staging:
1. cd /staging/
If necessary, change the permissions on the .run file to that it is executable with the following command

Run Setup

Run Planning with the BSSH Run Planner Tool

To plan a successful sequencing run, a with details on run configuration (e.g., sequencer type, flowcell, and sample type) is required. Follow the instrument-specific steps below to create a sample sheet compatible with the Illumina Prep Kit and DRAGEN Protein Quantification.

Log in to and select your workgroup.
In the Run Planning tool, configure the settings described in the following table. Some settings are instrument specific. When you select the DRAGEN Protein Quantification application, the library prep kit and index adapter kit populate automatically, along with additional instrument-specific settings.

Sample Sheet Fields

A sample sheet is required to kick off secondary analysis. It can be made either using the BSSH Run Planner Tool (recommended), Excel Sample Sheet Generator, or manually. The following table describes the sample sheet fields and its values depending on the environment used to execute the DRAGEN Protein Quantification application. The pipeline can be executed via cloud using either Illumina Connected Analytics or Base Space Sequence Hub, or executed locally using a phase 4 DRAGEN Server.

Sample Sheet Fields

This is a non comprehensive list of fields.

Section

Field

Value

Samplesheet Examples

Examples of local and cloud sample sheets for NovaSeq 6000 and NovaSeq X are attached to this page.

Local

Cloud

For additional information, refer to the ICS .

Local Sample Sheet Generation Tool

The Run Planning web interface, accessible via Base Space Sequence Hub (BSSH), requires internet connection. The Proteomics Sample Sheet Generator tool (linked below) allows for offline creation of sample sheets compatible with the local DRAGEN Protein Quantification application.

Prerequisites

Local Proteomics Sample Sheet Generator Excel Workbook.
One or more IPPAS output files (one per plate).
Access to a DRAGEN Server configured to the Illumina Protein Quantification Local Secondary Analysis (see prerequisites page for more information).

Steps

Download and open the Local Proteomics Sample Sheet Generator excel file.
Navigate to the Start tab and follow the instructions described in the upper left portion of the sheet:
1. Fill in the RunName and output_file_prefix.

Proteomics Sample Sheet Generator

Lane Splitting and Multi-Analysis by Project

Lane Splitting

The purpose of lane splitting is to enable the reuse of sample indexes (barcodes) on the same flow cell. To accomplish this, the samples indexed with the same barcodes must be physically separated by placing them on different lanes of the flow cell.

Currently, lane splitting functionality in Illumina Protein Prep is only supported on the NovaSeqX platform. To use lane splitting, use the sample sheet to indicate which lane(s) each sample is found in. Recommendation: Edit the "Lanes" column of the Illumina Protein Prep Automation System output file(s):

Counting and Normalization

Cloud Autolaunch Secondary Analysis

DRAGEN Protein Quantification Application

The DRAGEN Protein Quantification application is designed to perform counting and normalization for proteomics data from the Illumina Protein Prep pipeline. It converts data from the binary base call (BCL) files, generated by Illumina NovaSeq 6000 or NovaSeq X Series systems, into the normalized proteomic counts. Upon completion of sequencing, the application is automatically initiated for analysis on BaseSpace Sequence Hub (BSSH) or Illumina Connected Analytics (ICA).

The sections below exemplify how to configure the instrument(s) for autolaunching the secondary analysis, manually initiate an analysis, and requeue an analysis.

Accessing Cloud Results

Primary Metrics

To view primary metrics:

Go to the relevant BaseSpace Sequence Hub (BSSH) workgroup.
Navigate to the "Runs" tab and select the run.
The "Summary" tab gives an overview of the sequencing run quality (e.g., average %Q30, %PF Yield).
Navigate to "Metrics" for detailed per lane information on all sequencing metrics.

Secondary Results

To view the analysis associated with a specific sequencing run:

Navigate to the "Summary" tab of the relevant run.
Click on the link below "Latest Analysis" (displaying the results from the latest analysis processed to the run data). For re-queued/re-analyzed runs, the previously completed analyses can be found under the "Prior Analysis" section
Click on "Reports" and find the quality metrics associated with the secondary analysis on the run data.

Note: Those who have access to an ICA account and want to view results on ICA may either click on "View Files in ICA" in the top right corner of your BSSH Analysis page or directly access the analysis in ICA. The secondary analysis results in ICA will be in a BSSH-managed project with the same name as the BSSH workgroup where the analysis was performed.

For further information on tracking and viewing run and analysis results in BaseSpace Sequence Hub, refer to the BSSH documentation.

Local Secondary Analysis

DRAGEN Protein Quantification can be run locally after the installation of the local solution on a DRAGEN phase 4 server by an FAS.

To initiate a run:

run_DRAGEN_Protein_Quantification_<version>.sh
    -r <full path to run folder>
    -s <full path to sample sheet>
    --analysisFolder <full path to output folder>

The --analysisFolder parameter is optional. If no path is provided, output files will be put in a folder under the /staging/ directory.

WARNING: Currently, using a path off the DRAGEN server as an --analysisFolder (for example, to network attached storage) may cause an analysis failure. It's recommended to output the results to the DRAGEN server itself.

The -s parameter is optional if the sample sheet file is included in the run folder.

For details on the parameters used with the script, execute the following command.

Counting and Normalization

Counting

Protein counting is performed using DRAGEN BCL Convert. Sequencing produces barcoded reads for each sample that correspond to protein abundance. Barcoded reads are simultaneously demultiplexed and counted using DRAGEN BCL Convert. These barcode counts are stored as the Raw Counts ADAT.

QC Summary

QC Checks

There are a number of quality control checks that are applied on a plate and sample level. See the metrics appendix page for a summary of all metrics.

Minimum SOMAmer Read Counts: Non-blank samples with less than 10 million reads will receive a FLAG for SOMAmerReads_PassFlag in the ADAT. These reads are counted in the raw counts step. Only human protein SOMAmers are part of this count, not controls. There is no specification for blank samples.
Maximum SOMAmer Read Counts: Blank samples with more than 40 million normalized reads will receive a FLAG for SOMAmerNormRead_PassFlag in the ADAT. These reads are counted in the plate scale normalization step. Only human protein SOMAmers are part of this count, not controls. There is no specification for non-blank samples. A plate where 70% of the blanks have a FLAG for this step will receive a WARNING.
Reference Correlation: This step produces a Spearman correlation coefficient describing how similar a sample is to a an external Plasma or Serum reference (see below). There is no pass flag for this step.
- The reference used in this step can be found in the SOMAmer metadata, under Ref.MedNormExt.Plasma.QC or Ref.MedNormExt.Serum.QC (dependent on the input type).
Empirical Hyb Temp: This step uses 78 SOMAmer controls with a wide spectrum of melting temperature (Tm) from 28C to 72C, which represent the Tm of all probes used in the analysis. They were spiked to each sample at equal concentration; Tm controls with higher Tm have higher counts than those with lower Tm because their hybridization is more stable. The distribution of all Tm controls in a sample follows a logistic distribution; the inflection point is the EmpiricalHybTemp. This distribution is calculated using raw counts.
QC Check: This step compares the median of each SOMAmer measurement, across the three QC sample replicates, to an external QC reference. It then calculates a SOMAmer-specific scale factor and a QC metric (QCCheckTailPercent). QCCheckTailPercent corresponds to the percentage of SOMAmers with scale factors outside the specification range (0.8–1.2). If more than 15% of the scale factors are outside of the specification range, the plate receives a FAIL.
- The references used in this step can be found in the SOMAmer metadata, under Ref.QCCheck.Plasma or Ref.QCCheck.Serum (dependent on the input type).

Interpretation of Results

Sample Quality

The purpose of a flag is to highlight that a sample required a high degree of correction during the normalization process. This means a sample had sufficiently high or low signal, causing the normalization scale factors to be out of specification. The general recommendation is to exercise caution when using that sample in downstream analysis; normalization may not have been able to properly correct for the large changes in signal.

Output Files

Output Structure

DRAGEN Protein Quantification produces the following key output files in BaseSpace Sequence Hub:

DRAGEN_Protein_Quantification_<SW Version>
- <project> (if no projects are provided, there shall be one folder titled with the RunName)

DRAGEN Report

DRAGEN Reports is an HTML report that provides a quick overview of the quality of an E2E analysis. The report consists of three sections, which are displayed as tabs in the report. The following sections describe the tabs.

Plate QC

This section is subdivided into the following subsections:

Reagent Lot Summary: This table describes the reagents used per plate in the analysis.
Plate QC Summary: This table provides the plate level metrics, including Calibration % in Tails, QC % in Tails, Reference Correlation, and Blank Background metrics.
Calibration Scale Factors: This histogram illustrates the distribution of calibration scale factors for a given plate.
QC Scale Factors: This histogram illustrates the distribution of QC scale factors for a given plate.

Sample QC

This section of the report contains information on which samples passed or flagged specifications as well as SOMAmer count yield per sample. It is subdivided into the following subsections:

Sample QC Summary: The table describes the percentage of samples (organized by sample type) that passed Quality Control.
QC Summary: The heatmap shows the QC status of samples based on their position in the plate wells.
Flagged Samples: The table identifies samples that failed one or more sequencing or normalization specifications.

Specification

This report section details the QC metrics and normalization steps.

After Counting and Normalization

Using the ADAT

ADATs can be analyzed in R or Python using parsers created by Somalogic.

DRAGEN Protein Quantification v2.0.0 is compatible with SomaData v1.0.0 (Python - formerly called Canopy) and SomaDataIO v6.1.0 (R).

Python:
R:

Somadata creates an ADAT object, which is an extension of a Pandas DataFrame.

Compatibility with Excel

While using the R and Python parsers is the recommended way to manipulate ADAT files, it's also possible to open and view them in Excel.

Opening an ADAT in Excel (Windows)

Open a blank excel workbook and browse for a file.

References

Known Limitations

Known Limitations/Issues with DRAGEN Protein Quantification v2.2.2 Software:

Unable to support sample IDs with "_S" followed by a number. Please rename your samples to not include this string.
Unable to support multiple plates with unique probe plate values in a single analysis. Please process each set of plates with a unique probe plate together.

Documentation Revision History

Date

Changes Made