Only this pageAll pages
Powered by GitBook
1 of 31

DRAGEN Protein Quantification

Get Started

Loading...

Loading...

Loading...

Loading...

Run Setup

Loading...

Loading...

Loading...

Loading...

Counting and Normalization

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Output Files

Loading...

Loading...

Loading...

Loading...

After Counting and Normalization

Loading...

Loading...

Loading...

References

Loading...

Loading...

Loading...

Introduction

End to End (E2E) Illumina Protein Prep (IPP) workflow combines Illumina chemistry, SOMAmer technology, and DRAGEN data analysis for a comprehensive, automated NGS-based proteomics solution. This E2E solution provides the following:

  • NGS readout of more than 9.5K unique protein targets for a single plasma or serum sample.

  • From sample to processed results in under 2.5 days with just 4 hours of hands-on time.

  • Integrated analysis via BaseSpace Sequence Hub (BSSH), Illumina Connected Analytics (ICA), and Illumina Connected Multiomics (powered by Partek).

    • Includes both local and cloud solutions for planning a run and for processing data with DRAGEN Protein Quantification.

End-to-End Overview

The E2E Illumina Protein Prep solution integrates automation steps to perform sample preparation, protein capture, sequencing, and bioinformatics analysis. Once sequencing is finished, the DRAGEN Protein Quantification application automatically initiates in BSSH or ICA. The diagram below illustrates the E2E workflow.

For documentation on the assay and automation components of Illumina Protein Prep, please refer to the Illumina Protein Prep Product Documentation (document # 200045446).

Versioning

Unless otherwise specified, this documentaiton covers DRAGEN Protein Quantification v2.2.2.

E2E Illumina Protein Prep (IPP) workflow.

Lane Splitting and Multi-Analysis by Project

Lane Splitting

The purpose of lane splitting is to enable the reuse of sample indexes (barcodes) on the same flow cell. To accomplish this, the samples indexed with the same barcodes must be physically separated by placing them on different lanes of the flow cell.

Currently, lane splitting functionality in Illumina Protein Prep is only supported on the NovaSeqX platform. To use lane splitting, use the sample sheet to indicate which lane(s) each sample is found in. Recommendation: Edit the "Lanes" column of the Illumina Protein Prep Automation System output file(s):

  1. Find the Lanes column in the Illumina Protein Prep Automation System output file.

  2. Add lane numbers in which the sample is located as a comma-separated value. Example: '1,2,3,4' for a sample located in lanes 1, 2, 3, and 4.

  3. Upload the modified Illumina Protein Prep Automation System output file to the .

Multi-Analysis by Project

The goal of multi-analysis by project is to improve flexibility by enabling multiple analysis outputs from a single sequencing run without needing to requeue the sample sheet. A single project must include at least one plate (with its controls) and can include multiple plates used in a sequencing run. Each project produces one set of output files (including normalized ADATs and a DRAGEN Report).

To use multi-analysis by project, include the project name in the Project column of the Illumina Protein Prep Automation System output files. If no project information is provided, DRAGEN Protein Quantification will assume all plates belong to the same project and will assign the project name based on the user provided 'run name'. The multi-analysis by project feature is available for both the NovaSeq 6000 and NovaSeq X platforms.

  1. Find the Project column in the Illumina Protein Prep Automation System output file.

  2. Edit Project column in an Illumina Protein Prep Automation System output file by assigning a project name to each sample. Project name string should contain only alphanumeric characters and underscores.

  3. Upload modified Illumina Protein Prep Automation System output file to the Run Planner.

Note: If you want all samples from a sequencing run to be analyzed together, DO NOT include any values in the "Project" column of the sample sheet.

Note: If project splitting is enabled, all samples on one plate must be included in the same project. Samples from one plate CANNOT be split across projects.

Lane Splitting and Multi-Analysis Examples

Here is an example of what the first lines of two individual IPPAS output files may look like after project and lanes values were added. The plate with Barcode "A" adds lanes "1,2,3,4" and project value "ProjectA". The plate with barcode "B" adds lanes "5,6,7,8" and project value "ProjectB".

Illumina Protein Prep Automation System Output Files

The Illumina Protein Prep Automation System Output File (IPPAS output file) is a .csv file that's produced after automated library prep is completed. It contains the following fields:

Column Header
Description

Sample ID

Identifier per sample. Specified in Sample Manifest prior to automation.

Well position

Well position of the sample. Specified in Sample Manifest prior to automation.

Project

Optionally contains information about what project output a file should contain. See by Project for more details.

PlateBarcode

Barcode associated with plate. Specified in Sample Manifest prior to automation.

Local Secondary Analysis

DRAGEN Protein Quantification can be run locally after the installation of the local solution on a DRAGEN phase 4 server by an FAS.

To initiate a run:

run_DRAGEN_Protein_Quantification_<version>.sh
    -r <full path to run folder>
    -s <full path to sample sheet>
    --analysisFolder <full path to output folder> 

The --analysisFolder parameter is optional. If no path is provided, output files will be put in a folder under the /staging/ directory.

  • WARNING: Currently, using a path off the DRAGEN server as an --analysisFolder (for example, to network attached storage) may cause an analysis failure. It's recommended to output the results to the DRAGEN server itself.

The -s parameter is optional if the sample sheet file is included in the run folder.

For details on the parameters used with the script, execute the following command.

Counting and Normalization

Counting

Protein counting is performed using DRAGEN BCL Convert. Sequencing produces barcoded reads for each sample that correspond to protein abundance. Barcoded reads are simultaneously demultiplexed and counted using DRAGEN BCL Convert. These barcode counts are stored as the Raw Counts ADAT.

Normalization Summary

Cloud Autolaunch Secondary Analysis

DRAGEN Protein Quantification Application

The DRAGEN Protein Quantification application is designed to perform counting and normalization for proteomics data from the Illumina Protein Prep pipeline. It converts data from the binary base call (BCL) files, generated by Illumina NovaSeq 6000 or NovaSeq X Series systems, into the normalized proteomic counts. Upon completion of sequencing, the application is automatically initiated for analysis on BaseSpace Sequence Hub (BSSH) or Illumina Connected Analytics (ICA).

The sections below exemplify how to configure the instrument(s) for autolaunching the secondary analysis, manually initiate an analysis, and requeue an analysis.

Setting Up Autolaunch on Sequencing Instrument
  1. On your instrument, log into your workgroup and select the following run setup options:

    1. Workflow: NovaSeq Standard

    2. Read length: 15, 10, 10, 0

  2. [NovaSeq 6000] Upload the v2 sample sheet generated by the BSSH Run Planner tool to the sequencing instrument.

  3. [NovaSeq X Series] No action is required. The v2 sample sheet automatically displays on the instrument as a planned run.

  4. Select the appropriate Illumina Protein Prep custom recipe for your sequencing instrument and flowcell. During the run, data is uploaded automatically to BSSH. Primary analysis and secondary analysis are completed automatically through autolaunch.

  5. [Optional] Use the BaseSpace Sequence Hub to monitor the run from start to completion.

For additional information on autolaunch, refer to the Cloud Analysis Auto-launch page.

Manually Kicking Off Secondary Analysis

If manual mode sequencing is performed, or autolaunch is unsuccessful, there is an option to manually upload a completed run folder to BSSH and kick off the autolaunch analysis. To use this method, the samplesheet must be in the uploaded run folder, and be named "SampleSheet.csv".

Follow the instructions from the BSSH CLI documentation for manually uploading the run folder to BSSH.

Requeue

Follow the instructions described in Requeue Analysis Page (Requeue Analysis in BaseSpace tab).

BatchID

User-provided batch identifier

InputType

Sample input type, either plasma, serum, plamsa_calibrator, serum_calibrator, plasma_QC, serum_QC, or blank. Specified in Sample Manifest prior to automation.

MatrixTubeBarcode

Barcode associated with matrix tube. Specified in Sample Manifest prior to automation.

ControlID

Lot number for associated calibrator, QC, or blank sample.

ProbePlate

Proteomics Probe Plate barcode scanned during Proteomics Assay.

SOMAmerBeadPlate

SOMAmer-Bead Plate Dil 1 barcode.

Lanes

Optionally contains information about lane splitting. See Lane Splitting and Multi-Analysis by Project for more details.

Lane Splitting and Multi-Analysis
BSSH Run Planner

QC Summary

QC Checks

There are a number of quality control checks that are applied on a plate and sample level. See the metrics appendix page for a summary of all metrics.

  • Minimum SOMAmer Read Counts: Non-blank samples with less than 10 million reads will receive a FLAG for SOMAmerReads_PassFlag in the ADAT. These reads are counted in the raw counts step. Only human protein SOMAmers are part of this count, not controls. There is no specification for blank samples.

  • Maximum SOMAmer Read Counts: Blank samples with more than 40 million normalized reads will receive a FLAG for SOMAmerNormRead_PassFlag in the ADAT. These reads are counted in the plate scale normalization step. Only human protein SOMAmers are part of this count, not controls. There is no specification for non-blank samples. A plate where 70% of the blanks have a FLAG for this step will receive a WARNING.

  • Reference Correlation: This step produces a Spearman correlation coefficient describing how similar a sample is to a an external Plasma or Serum reference (see below). There is no pass flag for this step.

    • The reference used in this step can be found in the SOMAmer metadata, under Ref.MedNormExt.Plasma.QC or Ref.MedNormExt.Serum.QC (dependent on the input type).

  • Empirical Hyb Temp: This step uses 78 SOMAmer controls with a wide spectrum of melting temperature (Tm) from 28C to 72C, which represent the Tm of all probes used in the analysis. They were spiked to each sample at equal concentration; Tm controls with higher Tm have higher counts than those with lower Tm because their hybridization is more stable. The distribution of all Tm controls in a sample follows a logistic distribution; the inflection point is the EmpiricalHybTemp.

  • QC Check: This step compares the median of each SOMAmer measurement, across the three QC sample replicates, to an external QC reference. It then calculates a SOMAmer-specific scale factor and a QC metric (QCCheckTailPercent). QCCheckTailPercent corresponds to the percentage of SOMAmers with scale factors outside the specification range (0.8–1.2). If more than 15% of the scale factors are outside of the specification range, the plate receives a FAIL.

    • The references used in this step can be found in the SOMAmer metadata, under Ref.QCCheck.Plasma or Ref.QCCheck.Serum (dependent on the input type).

run_DRAGEN_Protein_Quantification_<version>.sh -h

Normalization corrects for the sources of confounding variation, such as overall protein concentration differences, minor deviations in volume transfer during the assay, or efficiency of library preparation steps. It is performed sequentially and produces an individual ADAT file with counts for each normalization step. See the metrics appendix page for a summary of all metrics.

  • Readout Correction: This step uses SOMAmer controls to reduce technical variability introduced in the NGS library prep.

  • Plate Correction: Plate Correction uses positive controls (calibrators) to correct for biases between plates. An external reference provided by Illumina ensures all Illumina Protein Prep plates are comparable.

  • Sample Correction: This step uses protein abundance in each sample to reduce technical variability introduced during protein quantification, and correct for differences in over all protein concentration.

Normalization Steps in Detail

  • Hybridization Normalization: This step corrects for biases that can occur during the hybridization and sequencing preparation stages of the assay. During hybridization, controls are spiked into each sample; during normalization, the counts for these controls are compared against an internal reference based on the non-blank controls on the plate. A scale factor is calculated for each sample, and if the scale factor is outside of the specification range (0.4–2.5), the sample will receive a FLAG.

  • Internal Reference Median Normalization: This step corrects for differences in the total protein abundance measurement of a sample. It is performed for each dilution group and runs separately for blank and calibrator control samples. A scale factor is calculated for each dilution group, by comparing the observed protein measurements to a reference of expected values for each protein.

    This reference is based on median protein counts across all samples of the same sample type on the same plate. If any of the scale factors for a sample are outside of the specification range (0.4–2.5), the sample will receive a FLAG.

  • Plate Scaling: This step corrects for possible changes in measured total protein counts between plates, when calibrator samples are present. The median of each SOMAmer measurement across the five calibrators is compared to an external calibration reference to calculate a plate scale factor.

    The first plate scaling step compares the calibrator medians to a reference derived from the sequencing instrument (NovaSeq 6000 or NovaSeq X Series) and adjusts the entire plate accordingly.

    The second step compares the scaled calibrator medians to a reference derived from the NovaSeq X Series (10B flow cell). There is no specification range for plate scaling.

    • The references used in this step can be found in the SOMAmer metadata, under Ref.Bridging.<CalibratorId>.<Instrument>.<Flowcell>.<MasterMixLot#>. The reference used by cross-instrument plate scaling is Ref.Bridging.<PlateBarcode>.NovaSeqX.10B.AA.

  • Calibration: This step corrects for batch effects that impact individual SOMAmers.

    The first calibration step (Platform Specific Calibration) compares the median of each SOMAmer measurement across the five Calibrator sample replicates to an external Calibration reference. This reference is derived from runs using the same instrument type, flow cell type, calibrator lot, and sample input type used in the run being analyzed. It then calculates a SOMAmer-specific scale factor and a plate-wide Calibrator metric (PlatformSpecificCalibrationTailPercent). PlatformSpecificCalibrationTailPercent corresponds to the percentage of SOMAmers with scale factors outside the specification range (0.6–1.4). If 15% of SOMAmers fall outside of this specification, the plate receives a WARNING for the PlatformSpecificTailPercent_PassFlag metric.

    The second calibration step (Cross Platform Calibration) compares the updated calibrator medians to a reference derived from the NovaSeq X Series (10B flow cell), using the same calibrator lot and sample input type as the run being analyzed. The scale factors used to align the median of the calibrators to the reference value are applied to all samples on the same plate.

  • External Reference Median Normalization: This step corrects for differences in the total protein signal for samples on each dilution plate. A scale factor is calculated by comparing the observed protein measurements to a reference of expected values for each protein. It is performed on plasma/serum samples and QC samples.

    If any of the scale factors for a sample are outside of the specification range (0.4–2.5), the sample will receive a FLAG.

    • The reference used in this step can be found in the SOMAmer metadata, under Ref.MedNormExt.Plasma or Ref.MedNormExt.Serum (dependent on the input type).

Outline of the protein capture assay and what parts of the process each normalization step impacts

Accessing Cloud Results

Primary Metrics

To view primary metrics:

  1. Go to the relevant BaseSpace Sequence Hub (BSSH) workgroup.

  2. Navigate to the "Runs" tab and select the run.

  3. The "Summary" tab gives an overview of the sequencing run quality (e.g., average %Q30, %PF Yield).

  4. Navigate to "Metrics" for detailed per lane information on all sequencing metrics.

Secondary Results

To view the analysis associated with a specific sequencing run:

  1. Navigate to the "Summary" tab of the relevant run.

  2. Click on the link below "Latest Analysis" (displaying the results from the latest analysis processed to the run data). For re-queued/re-analyzed runs, the previously completed analyses can be found under the "Prior Analysis" section

  3. Click on "Reports" and find the quality metrics associated with the secondary analysis on the run data.

Note: Those who have access to an ICA account and want to view results on ICA may either click on "View Files in ICA" in the top right corner of your BSSH Analysis page or directly access the analysis in ICA. The secondary analysis results in ICA will be in a BSSH-managed project with the same name as the BSSH workgroup where the analysis was performed.

For further information on tracking and viewing run and analysis results in BaseSpace Sequence Hub, refer to the BSSH documentation.

Local Software Install

Moving application install file to the server:

Via USB:

DRAGEN Report

DRAGEN Reports is an HTML report that provides a quick overview of the quality of an E2E analysis. The report consists of three sections, which are displayed as tabs in the report. The following sections describe the tabs.

Plate QC

This section is subdivided into the following subsections:

  • Reagent Lot Summary: This table describes the reagents used per plate in the analysis.

Local Sample Sheet Generation Tool

The Run Planning web interface, accessible via , requires internet connection. The Proteomics Sample Sheet Generator tool (linked below) allows for offline creation of sample sheets compatible with the local DRAGEN Protein Quantification application.

Prerequisites

  1. Local Proteomics Sample Sheet Generator Excel Workbook.

Documentation Revision History

Date
Changes Made

Plate QC Summary: This table provides the plate level metrics, including Calibration % in Tails, QC % in Tails, Reference Correlation, and Blank Background metrics.

  • Calibration Scale Factors: This histogram illustrates the distribution of calibration scale factors for a given plate.

  • QC Scale Factors: This histogram illustrates the distribution of QC scale factors for a given plate.

  • Sample QC

    This section of the report contains information on which samples passed or flagged specifications as well as SOMAmer count yield per sample. It is subdivided into the following subsections:

    • Sample QC Summary: The table describes the percentage of samples (organized by sample type) that passed Quality Control.

    • QC Summary: The heatmap shows the QC status of samples based on their position in the plate wells.

    • Flagged Samples: The table identifies samples that failed one or more sequencing or normalization specifications.

    • SOMAmer Read Counts: This graph represents the SOMAmer count for each sample included in the analysis.

    Specification

    This report section details the QC metrics and normalization steps.

    March 2025

    Initial Release

    May 2025

    • Multiomic Walkthrough Updated

    • ADAT Compatibility with Excel Added

    June 2025

    Release with v2.1 DRAGEN Protein Quantification.

    • Changes to ADAT Content

      • Header: Addition of Median, Q75, Q90 Intra-Plate and Inter-Plate CVs

      • Header: Removal of PlateRefCorr_PassFlag

      • SOMAmer metadata clean up

      • SOMAmer metadata: Removal of PassFlags for Calibration and QC: not used for analysis

      • Sample metadata: EmpHybTemp_PassFlag added

      • Sample metadata: RefCorr_PassFlag removed

      • Sample metadata: MedNormInt_PassFlags for serum, plasma and QC samples, and MedNormExt_PassFlags from blank and calibrator samples removed

    • Updated hybridization normalization to only use calibrator and QC samples

    • DRAGEN Application Manager support

    September 2025

    • Updated 6k and 9.5k SeqID lists in multiomics walkthrough

    • Updated SW version to 2.2.2

    November 2025

    • Added Known Limitations page

    View Data

    The references used in this step can be found in the SOMAmer metadata, under Ref.Bridging.<CalibratorId>.<Instrument>.<Flowcell>.<MasterMixLot#>. The reference used by cross-instrument calibration is Ref.Bridging.<CalibratorId>.NovaSeqX.10B.AA.

    Run the following command:

    1. cd /

  • Run the following command to identify which USB ports are in use:

    1. lsblk -I 8 –d

  • Record the ports that are currently in use.

  • Run the following command to create the USB drive mount directory on the DRAGEN server:

    1. mkdir /media/usb

  • Connect the USB drive with the DRAGEN Protein Quantification Pipeline installer to the front of the DRAGEN server.

  • Run the following command to confirm the USB drive name and details:

    1. lsblk

  • INFO: The details include the name of the USB drive under the Name column (sda, sdb, sdc, or sdd). The partition name also displays under the drive name (for example, sdc1).

  • Compare the USB ports that display to the ports identified in step 4 and identify the new port that appears. This port is where the installation software is located.

  • Run the following command to mount the USB drive to the USB mount directory of the DRAGEN server:

    1. mount /dev/<port> /media/usb/

    2. For example: mount /dev/sdc1 /media/usb

  • Run the following command to find the SHA value in the installer file:

    1. head -n25 /media/usb/install_DRAGEN_Protein_Quantification_v<version>.run | grep '^SHA'

  • Review the following table for SHA values.\

    SW Version
    SHA

    2.2.2

    d1000ea10b7eb5483db776efbe4873e80533dd0e5f5d713339a89cf5e7ee1a2b

    1. WARNING: If the SHA values do not match, stop the installation and contact Illumina Technical Support.

  • Run the following command to make sure that the USB drive is mounted to the USB mount directory:

    1. lsblk -I 8 –d

  • Run the following command to confirm that install_DRAGEN_Protein_Quantification_v<version>.run is in the USB drive mount directory:

    1. ls /media/usb/

  • Run the following command to copy install_DRAGEN_Protein_Quantification_v <version>.run to the staging directory:

    1. cp /media/usb/install_DRAGEN_Protein_Quantification_v<version>.run /staging/

  • Unmount USB from mount directory:

    1. umount /dev/<usb partition name>

  • Via Cloud Download

    1. Navigate to staging:

      1. cd /staging/

    2. Download the installer from its online location

      1. curl <link>

    Via Connected Server

    1. Copy the installer from an attached server:

      1. cp <external location of installer>/install_DRAGEN_Protein_Quantification_v <version>.run /staging/

    Installation of application:

    1. Run the following command to change directories to staging:

      1. cd /staging/

    2. If necessary, change the permissions on the .run file to that it is executable with the following command

      1. chmod +x install_DRAGEN_Protein_Quantification_v<version>.run

    3. Run the following command to make a temporary directory and install the .run file:

      1. sudo ./install_DRAGEN_Protein_Quantification_v<version>.run --target /staging/

    4. If DAM is not installed, run the following command:

      1. sudo ./dragen-app-manager-<version> --target /staging/

      2. sudo ./install_DRAGEN_Protein_Quantification_v<version>.run --target /staging/

    5. Run the following command to show the contents of the /usr/local/bin/ directory:

      1. ls /usr/local/bin/

    6. Make sure that the /usr/local/bin/ directory has the following scripts:

      1. check_DRAGEN_Protein_Quantification-<version>.sh

      2. uninstall_DRAGEN_Protein_Quantification_<version>.sh

      3. run_DRAGEN_Protein_Quantification_<version>.sh

    7. While in /staging directory, run the following command to confirm that the installation is successful:

      1. check_DRAGEN_Protein_Quantification_<version>.sh

    8. Run the following help command to display the command options:

      1. run_DRAGEN_Protein_Quantification_<version>.sh –h

    9. Perform run mock test using the following command:

      1. run_DRAGEN_Protein_Quantification_<version>.sh --mock -r /staging/dragen-app-manager/applications/Illumina_DRAGEN_Protein_Quantification_<version>/resources/test-files/mock_run_folder/

    10. Confirm that the output files are present in the folder that was created during the mock test.

    11. If an error occurs during installation, uninstall and reinstall DRAGEN Protein Quantification Pipeline as follows.

      1. Run the uninstall_DRAGEN_Protein_Quantification_<version>.sh script.

      2. Make sure that the following message displays:

        1. Successfully uninstalled DRAGEN_Protein_Quantification scripts, test-data, and images.

    One or more IPPAS output files (one per plate).
  • Access to a DRAGEN Server configured to the Illumina Protein Quantification Local Secondary Analysis (see prerequisites page for more information).

  • Steps

    1. Download and open the Local Proteomics Sample Sheet Generator excel file.

    2. Navigate to the Start tab and follow the instructions described in the upper left portion of the sheet:

      1. Fill in the RunName and output_file_prefix.

      2. Change the InstrumentPlatform and PrepKitName as needed.

      3. Copy and paste values from IPPAS output file(s) not including the headers into the specified fields.

      4. Add the Flow Cell lanes for each sample as a comma separated list with no spaces as needed.

      5. Ensure there are no errors displayed in the Row Check or Overall Check (a green cell is expected)

      6. Navigate to the "SaveAsCSV" and save it as CSV (filename must not contain special characters).

      7. This is a sample sheet compatible to local DRAGEN Protein Quantification.

    3. (Recommended) Load this sample sheet onto the sequencer prior to sequencing.

      1. If the sample sheet is not included prior to sequencing, the user must manually reference the sample sheet when running DRAGEN Protein Quantification later.

    4. Analyze the data locally using a DRAGEN Server with DRAGEN Protein Quantification Pipeline (see for more information).

    Proteomics Sample Sheet Generator

    Base Space Sequence Hub (BSSH)
    524KB
    ProteomicsSamplesheetGenerator.xlsx
    Open

    Metrics Appendix

    Sample Level Metrics

    Sample level metrics are found in the sample/row-level metadata in the ADAT.

    Metric
    ADAT Field
    Applied To
    Threshold
    Impact to Sample

    SOMAmer Level Metrics

    SOMAmer level metrics are found in the SOMAmer/column-level metadata in the ADAT.

    Metric
    ADAT Field

    Plate Level Metrics

    Plate level metrics are found in the header metadata in the ADAT.

    Metric
    ADAT Field
    Threshold
    Impact to Plate

    Interpretation of Results

    Sample Quality

    The purpose of a flag is to highlight that a sample required a high degree of correction during the normalization process. This means a sample had sufficiently high or low signal, causing the normalization scale factors to be out of specification. The general recommendation is to exercise caution when using that sample in downstream analysis; normalization may not have been able to properly correct for the large changes in signal.

    Flag at SOMAmer Reads

    A flag in a non-blank sample indicates low SOMAmer read depth. DRAGEN Protein Quantification has a minimum read depth to ensure measurement precision is achieved for each sample. Flagged samples may have a decrease in measurement precision.

    Flag at SOMAmer Normalized Reads

    A flag in a blank sample may indicate increased background or contamination.

    Flag at Reference Correlation

    A flag in a blank sample may indicate plasma or serum contamination. In internal studies, uncontaminated blanks generally have low RefCorr values (~0.4) and blanks with greater than 2% plasma or serum contamination were observed to have a RefCorr values of at least 0.6.

    Flag at Empirical Hyb Temp

    A flag in a sample indicates that there is an abnormal distribution of Tm controls in that well. This could be caused by:

    • An elevated hybridization temperature in the sample well caused the right shift in Tm controls distribution and EmpiricalHybTemp (>52.4);

    • The Tm control counts was so distorted by elevated hybridization temperature that their distribution does not follow a logistic distribution anymore. The EmpiricalHybTemp cannot be determined from the distribution and no value is provided.

    Flag at Hybridization Normalization

    A flag in a sample indicates that the hybridization controls in that sample had elevated or decreased signal compared to the plate plasma or serum controls. Flags here indicate significant differences between the sample and the non-blank controls. Potential reasons for this could be poor sample quality, protein loss during hybridization (e.g., due to elevated temperature) or other issues that arise during library prep. In this case, other failure modes will likely also be flagged, such as low Raw Counts, or high / low MedNormExt scale factors.

    Flag at Internal Reference Median Normalization

    A flag in a calibrator indicates that the flagged calibrator replicate had a large discrepancy in signal when compared to the median of all calibrators within that plate. A single flagged calibrator will have limited impact on the plate due to use of median calibrator values in analysis. Multiple flagged calibrators could impact plate performance; reach out to Illumina Support for additional information.

    Flag at External Reference Median Normalization

    A flag in a sample indicates that it had a large discrepancy in signal when compared to the external plasma or serum reference. Low signal may be caused by sample degradation or sample dilution; high signal may be caused by hemolysis. A difference in signal could also be caused by the diseased-state of the sample.

    Plate Quality

    QC Percent in Tails and Calibration Percent in Tails are used to determine plate quality by examining individual SOMAmers in each well. A QC Percent in Tails failure will likely require a repeat library preparation or re-sequencing for this plate; contact Illumina Support for additional information.

    Review the following table for more details.

    Plate Quality Matrix

    Run Quality

    There are two additional metrics on run quality that evaluates blank samples in a plate (see below). The general recommendation is that, if warnings are observed in one of these metrics, the plate should be used with caution in downstream analysis since there may have been sample contamination or elevated background. If issues with run quality are consistently seen, there may be issues with the assay set up. Contact Illumina Support for additional information or support.

    Warning for SOMAmer Normalized Reads

    A warning on a plate may indicate plate-wide increased background or contamination.

    Warning at Reference Correlation

    A warning on a plate may indicate plate-wide plasma or serum contamination.

    Output Structure

    DRAGEN Protein Quantification produces the following key output files in BaseSpace Sequence Hub:

    • DRAGEN_Protein_Quantification_<SW Version>

      • <project> (if no projects are provided, there shall be one folder titled with the RunName)

        • adat

          • <output file prefix>_FinalNormStep_MedNormExt.adat: This file contains the final normalized counts for the samples in this project.

          • other_normalization_steps: This folder contains all intermediate normalized ADATs with their respective counts.

        • quality metrics

          • <output file prefix>_run_qc_stats.csv: This file provides a summary of per sample run qc statistics.

        • DRAGEN Report

    • DRAGEN Report: Folder containing all projects DRAGEN report file/s

    • Extra Files

    • ICA Logs: Folder containing information on ICA logs

    Example of Output Structure in BSSH

    The example below contains an analysis configured with two Projects, "1_x" and "0_x". Analysis and outputs are processed for each project individually (check for details).

    Run Planning with the BSSH Run Planner Tool

    To plan a successful sequencing run, a with details on run configuration (e.g., sequencer type, flowcell, and sample type) is required. Follow the instrument-specific steps below to create a sample sheet compatible with the Illumina Prep Kit and DRAGEN Protein Quantification.

    1. Log in to and select your workgroup.

    2. In the Run Planning tool, configure the settings described in the following table. Some settings are instrument specific. When you select the DRAGEN Protein Quantification application, the library prep kit and index adapter kit populate automatically, along with additional instrument-specific settings.

    This table describes the possible configuration settings and values.

    Prerequisites

    Before setting up and running the Illumina Protein Prep End-to-End (E2E) solution, ensure that the necessary software, tools, and configurations are in place. These prerequisites can vary depending on the environment used to run the secondary analysis (e.g., via cloud or locally). Follow the steps below to configure instrument and software appropriately.

    Instrument Software Prerequisites

    • Control Software (v1.8.0 or later)

    Known Limitations

    Known Limitations/Issues with DRAGEN Protein Quantification v2.2.2 Software:

    • Unable to support sample IDs with "_S" followed by a number. Please rename your samples to not include this string.

    • Unable to support multiple plates with unique probe plate values in a single analysis. Please process each set of plates with a unique probe plate together.

    • Only supports 96 samples per PlateBarcode for a single analysis.

    Compatibility with Excel

    While using the R and Python parsers is the recommended way to manipulate ADAT files, it's also possible to open and view them in Excel.

    Opening an ADAT in Excel (Windows)

    1. Open a blank excel workbook and browse for a file.

    Confirm that install_DRAGEN_Protein_Quantification_v<version>.run is present in staging:

    1. ls /staging/

  • Repeat the steps in this section to reinstall DRAGEN Protein Quantification Pipeline.

  • If the reinstallation fails, redo steps to put the application installer on the server again.

    1. Unmount and reinsert USB, proceeding with the Via USB subsection from the Move application installer to server section (unmount using step 17 from the above section of the guide).

  • Repeat steps in this section to reinstall DRAGEN Protein Quantification Pipeline.

  • SampleIDs must be unique prior to uploading IPPAS output files to BaseSpace Sequence Hub run planner tool.

  • Unable to support AA probe pools runs (as noted by the last two values in the Probe Plate barcode. ie PP201013012345678-AA).

  • Known Limitations/Issues with Local Sample Sheet Generation Tool v1.0.0:

    • Sample sheet must be saved as "CSV Comma-Delimited" and not "CSV UTF-8".

    Calibration Percent in Tails

    QC Percent in Tails

    Interpretation

    WARNING

    The median of the calibrator replicates for that plate were sufficiently different from the expected.

    FAIL The median of the QC replicates for that plate were sufficiently different from the expected QC reference.

    • Likely a failed run

    • Some samples may individually pass but should not be used for downstream analysis due to the plate failure

    • One or more normalization steps after Calibration were unable to rescue the run

    • Could be caused by a plate-wide issue, including sequencing run failure, automation failure, or reagent issue

    WARNING

    The median of the calibrator replicates for that plate were sufficiently different from the expected.

    PASS The median of the QC samples were sufficiently similar to expected values from QC reference.

    • Successful Run

    • One or more normalization steps after the Calibration step have successfully removed the differences observed in calibrators

    PASS

    The median of the calibrator samples were sufficiently similar to expected values from calibration reference.

    FAIL The median of the QC replicates for that plate were sufficiently different from the expected QC reference.

    • Technically a failed run, but could be a false positive

    • Relatively rare combination

    • May indicate problem with only QC samples and not entire plate (e.g., edge effect)

    Local Secondary Analysis page
    Multi-Analysis section
    Each project folder (highlighted in yellow) includes "quality_metrics" and "adat" folders, along with a DRAGEN Report. The final normalized counts file (FinalNormStep_MedNormExt.adat) is available for each project and can be used in exploratory analysis.

    Reference Correlation

    PlateRefCorr_PassFlag

    < 70% of blank samples flagged at RefCorr

    PASS/WARNING

    Platform Specific Plate Scale

    PlatformSpecificPlateScale_ScaleFactor

    N/A

    N/A

    Cross Platform Plate Scale

    CrossPlatformPlateScale_ScaleFactor

    N/A

    N/A

    Cross Platform Calibration

    CrossPlatformCalibrationTailPercent

    N/A

    N/A

    Minimum SOMAmer Read Count

    SOMAmerReads; SOMAmerReads_PassFlag

    Non-Blank Samples

    SOMAmerReads > 10 million

    PASS/FLAG

    Maximum SOMAmer Normalized Read Count

    SOMAmerNormReads; SOMAmerNormReads_PassFlag

    Blank Samples

    SOMAmerNormReads < 40 million reads

    PASS/FLAG

    Hybridization Normalization

    HybNorm_1_ScaleFactor; HybNorm_PassFlag

    Non-Blank Samples

    0.4 ≤ HybNorm_1_ScaleFactor ≤ 2.5

    PASS/FLAG

    Internal Reference Median Normalization

    MedNormInt_5e-05_ScaleFactor; MedNormInt_0.005_ScaleFactor; MedNormInt_0.2_ScaleFactor; MedNormInt_PassFlag

    Individual Blank and Calibrator samples, split by dilution group

    0.4 ≤ MedNormInt_5e-05_ScaleFactor ≤ 2.5 0.4 ≤ MedNormInt_0.005_ScaleFactor ≤ 2.5 0.4 ≤ MedNormInt_0.2_ScaleFactor ≤ 2.5

    PASS/FLAG

    External Reference Median Normalization

    MedNormExt_5e-05_ScaleFactor; MedNormExt_0.005_ScaleFactor; MedNormExt_0.2_ScaleFactor; MedNormExt_PassFlag

    Individual non-control and QC samples, split by dilution group

    0.4 ≤ MedNormExt_5e-05_ScaleFactor ≤ 2.5 0.4 ≤ MedNormExt_0.005_ScaleFactor ≤ 2.5 0.4 ≤ MedNormExt_0.2_ScaleFactor ≤ 2.5

    PASS/FLAG

    Correlation with Reference

    RefCorr; RefCorr_PassFlag

    Blank Samples

    RefCorr < 0.6

    PASS/FLAG

    TM Controls

    EmpiricalHybTemp; EmpiricalHybTemp_PassFlag

    All samples

    EmpiricalHybTemp < 52.4

    PASS/FLAG

    Row Check

    RowCheck_PassFlag

    All samples

    The row check shall be PASS is all Pass Flags in this row are PASS; otherwise it shall be FLAG

    PASS/FLAG

    Platform Specific Calibration

    PlatformSpecificCalibrate_<PlateBarcode>_ScaleFactor

    Cross Platform Calibration

    CrossPlatformCalibrate_<PlateBarcode>_ScaleFactor

    QC Check

    QCCheck_<PlateBarcode>_ScaleFactor

    QC Percent in Tails

    QCCheckTailPercent; QCCheckTailPercent_PassFlag

    QCCheckTailPercent < 15%

    PASS/FAIL

    Calibration Percent in Tails

    PlatformSpecificCalibrateTailPercent; PlatformSpecificCalibrateTailPercent_PassFlag

    PlatformSpecificCalibrateTailPercent < 15%

    PASS/WARNING

    SOMAmer Normalized Reads

    PlateSOMAmerNormReads_PassFlag

    < 70% of blank samples flagged at SOMAmerNormReads

    PASS/WARNING

    Setting
    NovaSeq 6000 Value

    Instrument Platform

    NovaSeq 6000/6000Dx

    Secondary Analysis

    [Cloud analysis] BaseSpace/Illumina Connected Analytics

    [Local analysis] DRAGEN Server

    Application

    DRAGEN Protein Quantification (select the latest version)

    Library Prep Kit

    Illumina Protein Prep 9k (auto-populated)

    Index Adapter Kit

    Illumina DNA-RNA UD Indexes Set A B C D Tagmentation (auto-populated)

    [NovaSeq 6000]

    These settings are configured automatically and are not editable. - 2 indexes - Single Read - 15, 10, 10, 0

    1. Lane Splitting is not supported for NovaSeq 6000 runs. Select "Repeat set of samples across all lanes."

    2. Upload Illumina Protein Prep Automation System output file (*.csv) to BSSH Run Planner.

      • Select Import samples, select the CSV file type, and upload the Illumina Protein Prep Automation System output file. The interface highlights invalid values immediately after the file is rendered.

      • [Optional] To include a second plate in the run, repeat the import process and select Add to existing samples when prompted. The new samples are appended to the samples that were uploaded previously.

      WARNING - Sample IDs and index sequences must be unique within a sequencing run. If combining libraries from multiple Illumina Protein Prep runs, avoid combining plates that contain the same sample IDs or index sequences.

    3. [Optional] Multi-project analysis: Users may add Project values either in the Illumina Protein Prep output file (prior to uploading) or add values in the BSSH Run Planner. See for more details. WARNING - All samples from the same plate must have the same project (or no project) value.

    4. [Optional] Enter an appropriate output file prefix. This value is used as a part of the prefix for the secondary analysis output file names. The first character must be alphanumeric. For the remaining characters, alphanumeric, hyphens, underscores, and spaces are permitted.

    5. Proceed to the Run Review page and save the planned run.

      • [NovaSeq 6000, cloud analysis] Download the sample sheet and save it to a network location accessible to the sequencing instrument.

      • [NovaSeq 6000 Local analysis] Select Export to download the sample sheet. Save the file to a network location accessible to the sequencing instrument.

    For additional information on run planning, refer to the BSSH Plan Runs documentation.

    1. Log in to BaseSpace Sequence Hub and select your workgroup.

    2. In the Run Planning tool, configure the settings described in the following table. Some settings are instrument specific. When you select the DRAGEN Protein Quantification application, the library prep kit and index adapter kit populate automatically, along with additional instrument-specific settings.

    This table describes the possible configuration settings and values.

    Setting
    NovaSeq X Value
    1. If lane splitting will be utilized with this sequencing run, it's recommended to add values to Lane column to Illumina Protein Prep Automation System output files locally. See for more details.

      1. If lane splitting is not utilized, do not edit the Lane column in the Illumina Protein Prep Automation System output file.

    2. Upload Illumina Protein Prep Automation System output file (*.csv) to BSSH Run Planner.

    For additional information on run planning, refer to the documentation.

    Additional Notes:

    • DRAGEN Protein Quantification does not support Multiple Analysis on NovaSeq X.

    sample sheet
    BaseSpace Sequence Hub

    Illumina Protein Prep custom recipe XML file installed on the sequencing instrument. Illumina provides the file.

    • Illumina Protein Prep NovaSeq 6000 v1.0.xml

  • Illumina Protein Prep Automation System output files, depending on the sequencing set up.

    • NovaSeq 6000 (S4 flow cell)

      • Recommended: Two IPPAS output files / 192 reactions

    • Control Software (v1.3.0 or later)

    • Illumina Protein Prep custom recipe XML file installed on the sequencing instrument. Illumina provides the file.

      • Illumina Protein Prep NovaSeq X 10B v2.0.xml

      • Illumina Protein Prep NovaSeq X 25B 100 cycle v1.0.xml

      • Illumina Protein Prep NovaSeq X 25B 200 cycle v1.0.xml (upon request only)

      • Illumina Protein Prep NovaSeq X 25B 300 cycle v1.0.xml (upon request only)

    • Illumina Protein Prep Automation System output files, depending on the sequencing set up.

      • NovaSeq X (10B flow cell)

        • Recommended: Two IPPAS output files / 192 reactions

      • NovaSeq X (25B flow cell)

    Cloud Secondary Analysis Prerequisites

    • BaseSpace Sequence Hub (BSSH) or Illumina Connected Analytics (ICA) account.

      • All BSSH and ICA subscriptions come with access to the DRAGEN Protein Quantification application.

      • [ICA subscribers] To view results in ICA, select ICA Run Storage in BSSH Workgroup Settings.

      • For more information about which subscription is best for your use case, contact your FAS or TAM.

    • For information on registering a BaseSpace Sequence Hub or Illumina Connected Analytics account, refer to the or connect with an Illumina Account Manager.

    Note: When performing analysis with more than one plate, each plate must have their own unique plate barcode value.

    Local Secondary Analysis Prerequisites

    • Root privileges for install

    • Willingness to install Docker (included in DRAGEN Application Manager installation)

    • DRAGEN Phase 4 server with the minimum disk space:

      • NovaSeq6000, S4 flow cell: ≥ 10GB

      • NovaSeqX, 10B flow cell: ≥ 20 GB

      • NovaSeqX, 25B flow cell: ≥ 35 GB

    • DRAGEN Phase 4 server with the minimum available RAM:

      • NovaSeq6000, S4 flow cell: ≥ 75GB

      • NovaSeqX, 10B flow cell: ≥ 80GB

      • NovaSeqX, 25B flow cell: ≥ 95GB

    • External storage drive mounted to a DRAGEN server. This drive must be mounted through a network share and support NFS/CIFS/SMB protocols. Read and write permissions are required to use this network share. For more information, please refer to the Illumina Protein Prep Product Documentation (document # 200045446).

    Confirm that "All Files (*.*)" are searchable.
    1. Accept default parameters from Excel.

    1. View ADAT in Excel!

    Opening an ADAT in Excel (Mac)

    1. Open a blank excel workbook and select File --> Import.

    1. Select the previously downloaded .adat file.

    1. After Click on "Get Data", select "Text file" and click on "Import".

    1. Select Delimited (default) and click on "Next >".

    1. Select "Tab" (default) and click on "Next >".

    1. Select "General" (default) and click on "Next >".

    1. Select the sheet and click on "Import".

    1. View the ADAT in Excel!

    Sample Sheet Fields

    A sample sheet is required to kick off secondary analysis. It can be made either using the BSSH Run Planner Tool (recommended), Excel Sample Sheet Generator, or manually. The following table describes the sample sheet fields and its values depending on the environment used to execute the DRAGEN Protein Quantification application. The pipeline can be executed via cloud using either or , or executed locally using a phase 4 .

    Sample Sheet Fields

    This is a non comprehensive list of fields.

    Section
    Field

    Using the ADAT

    ADATs can be analyzed in R or Python using parsers created by Somalogic.

    DRAGEN Protein Quantification v2.0.0 is compatible with SomaData v1.0.0 (Python - formerly called Canopy) and SomaDataIO v6.1.0 (R).

    • Python:

    • R:

    Somadata creates an ADAT object, which is an extension of a Pandas DataFrame.

    Recommended: Four IPPAS output files / 384 reactions

    account management documentation
    rows correspond to samples
  • columns correspond to SOMAmers

  • values are normalized counts

  • Below are examples on parsing an ADAT in Python and R.

    Parsing ADAT in Python

    Parsing ADAT in R

    https://github.com/SomaLogic/Canopy
    https://github.com/SomaLogic/SomaDataIO
    import somadata
    
    # read the adat
    my_adat = somadata.read_adat('/path/to/my/file1.adat')
    
    # retrieve sample metadata
    sample_meta = my_adat.index.to_frame(index=False)
    
    # retrieve SOMAmer metadata
    soma_meta = my_adat.columns.to_frame(index=False)
    
    # retrieve the scale factors for all plates
    plate_scale_factors_dict = my_adat.header_metadata['PlatformSpecificPlateScale_ScaleFactor']
    
    ### additional optional manipulation 
    
    # concatenate multiple adats into a single adat
    my_adat2 = somadata.read_adat('/path/to/my/file2.adat')
    my_merged_adat = soma.smart_adat_concatenation([my_adat, my_adat2])
    
    # write it to a file 
    my_merged_adat = my_merged_adat.to_adat('/path/to/merged/adat')
    library(SomaDataIO)
    
    # check all package functions
    ls("package:SomaDataIO")
    
    #read the adat
    my_adat <- read_adat('/path/to/my/file1.adat')
    
    # retrieve sample metadata
    sample_meta <- my_adat[getMeta(my_adat)]
    
    # retrieve SOMAmer metadata
    soma_meta <- my_adat[getAnalytes(my_adat)]
    
    # create a function that parses the header
    parse_adat_header <- function(adat_file, max_lines = 500) {
      header_lines <- readLines(adat_file, n = max_lines)
      table_row <- grep("TABLE_BEGIN", header_lines)
      header_lines[1:(table_row-1)] %>% 
        strsplit("\t") %>% 
        {
          values <- map(., 2) # Get values from key value pairs
          keys <- map_chr(., 1) # Get keys...
          setNames(values, keys)
        }
    }
    
    my_adat_header <- parse_adat_header('/path/to/my/file1.adat')
    
    # retrieve the scale factors for all plates
    plate_scale_factors_dict = my_adat_header$PlatformSpecificPlateScale_ScaleFactor
    
    ### additional optional manipulation 
    my_adat2 = read_adat('/path/to/my/file2.adat')
    my_merged_adat = rbind(my_adat, my_adat2)
    
    # write it to a file 
    my_merged_adat = write_adat(my_merged_adat, '/path/to/merged/adat')
    

    Select Import samples, select the CSV file type, and upload the Illumina Protein Prep Automation System output file. The interface highlights invalid values immediately after the file is rendered.

  • [Optional] To include a second plate in the run, repeat the import process and select "Add to existing samples" when prompted. The new samples are appended to the samples that were uploaded previously.

  • Barcode mismatch 1 and 2—No action is required. The default value is set to 1. Do not change this value.

  • If lane splitting will not be utilized, indicate that each sample is present in all lanes. For the first sample, click on the Lanes box and select the first checkbox. This will populate the cell with "1,2,3,4,5,6,7,8". Then, select the Lanes header and click "Fill down". This will add these lane values for all samples.

  • WARNING - Sample IDs and index sequences must be unique within a sequencing run. If combining libraries from multiple Illumina Protein Prep runs, avoid combining plates that contain the same sample IDs or index sequences. When combining libraries with non-unique indexes, ensure they are loaded into different flow cell lanes, and that lane splitting is enabled during sample sheet creation.

  • [Optional] Multi-project analysis: Users may add Project values either in the Illumina Protein Prep output file (prior to uploading) or add values in the BaseSpace Run Planner. See Lane Splitting and Multi-Analysis by Project for more details. WARNING - All samples from the same plate must have the same project (or no project) value.

  • [Optional] Enter an appropriate output file prefix. This value is used as a part of the prefix for the secondary analysis output file names. The first character must be alphanumeric. For the remaining characters, alphanumeric, hyphens, underscores, and spaces are permitted.

  • Proceed to the Run Review page and save the planned run.

    • [NovaSeq X Series, cloud analysis] No action is required. The sample sheet is automatically uploaded to the instrument.

    • [Local analysis] Select Export to download the sample sheet. Save the file to a network location accessible to the sequencing instrument.

  • Instrument Platform

    NovaSeq X Series

    Secondary Analysis

    [Cloud analysis] BaseSpace/Illumina Connected Analytics

    [Local analysis] Local

    [Local analysis]

    FASTQ file compression format

    DRAGEN

    This setting is required by default. The setting does not impact DRAGEN Protein Quantification as no FASTQ files are output.

    [Local analysis]

    Generate FastQC metrics

    Yes This setting is optional. The setting does not impact DRAGEN Protein Quantification as no FASTQ files are output.

    Read Lengths

    - Read 1: 15 - Index 1: 10 - Index 2: 10 - Read 2: 0

    Application

    DRAGEN Protein Quantification (select the latest version)

    Library Prep Kit

    Illumina Protein Prep 9k (auto-populated)

    Index Adapter Kit

    Illumina DNA-RNA UD Indexes Set A B C D Tagmentation (auto-populated)

    Override Cycles

    These settings are configured automatically and should not be edited. - Read 1: Y15 - Index 1: I10 - Index 2: I10

    Lane Splitting and Multi-Analysis by Project
    Lane Splitting and Multi-Analysis by Project
    BSSH Plan Runs
    Value

    Header

    FileFormatVersion

    Must be "2"

    Header

    InstrumentPlatform

    Must be "NovaSeqXSeries" or "NovaSeq"

    Header

    RunName

    User-provided value

    Header

    RunDescription

    User-provided value (optional)

    Reads

    Read1Cycle

    15

    Reads

    Samplesheet Examples

    Examples of local and cloud sample sheets for NovaSeq 6000 and NovaSeq X are attached to this page.

    Local

    Cloud

    For additional information, refer to the ICS Samplesheet documentation.

    Illumina Connected Analytics
    Base Space Sequence Hub
    DRAGEN Server
    3KB
    SampleSheet_NVSQ6000_local_v2.2.2.csv
    Open
    Example NovaSeq 6000 samplesheet for local analysis
    3KB
    SampleSheet_NVSQX_local_v2.2.2.csv
    Open
    Example NovaSeq X samplesheet for local analysis
    3KB
    SampleSheet_NVSQX_autolaunch_v2.2.2.csv
    Open
    Example NovaSeq X samplesheet for autolaunch analysis
    3KB
    SampleSheet_NVSQ6000_autolaunch_v2.2.2.csv
    Open
    Example NovaSeq 6000 samplesheet for autolaunch analysis

    Index1Cycle

    10

    Reads

    Index2Cycle

    10

    Sequencing_Settings

    LibraryPrepKits

    Must be "IlluminaProteinPrep9k"

    BCLConvert_Data

    Sample_ID

    Alphanumeric name up to 100 characters. Letters, numbers, dashes only (or any combination of letters, numbers, and dashes)

    BCLConvert_Data

    Index

    i7 index sequence, including A, C, T or G letters, 10 nucleotides long

    BCLConvert_Data

    Index2

    i5 index sequence, including A, C, T or G letters, 10 nucleotides long

    BCLConvert_Data

    Lane

    Lane value (shall be a number between 1 and 8 inclusive) (optional if instrument is NovaSeq 6k, required if instrument is NovaSeq X)

    Cloud_Proteomics_Settings (Cloud Analysis) or Proteomics_Settings (Local Analysis)

    SoftwareVersion

    Three-digit version of SW used in secondary analysis. For example, "2.2.2"

    Cloud_Proteomics_Settings (Cloud Analysis) or Proteomics_Settings (Local Analysis)

    StartsFromFastq

    Must be "false"

    Cloud_Proteomics_Settings (Cloud Analysis) or Proteomics_Settings (Local Analysis)

    output_file_prefix

    User-provided prefix

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    Sample_ID

    Alphanumeric name up to 100 characters. Letters, numbers, dashes only (or any combination of letters, numbers, and dashes)

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    PlateBarcode

    For each plate, associated plate barcode

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    MatrixTubeBarcode

    For each plate, associated matrix tube barcode

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    BatchID

    For each plate, user-provided batchID

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    InputType

    For each sample, associated input type. Examples: Plasma_Calibrator, Plasma_QC, Plasma, Serum_Calibrator, Serum_ QC, Serum, Blank

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    ControlID

    ID of the calibrator, QC, and blank lot (applied to controls only).

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    ProbePlate

    For each plate, probe plate lot number, from library prep

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    SOMAmerBeadPlate

    For each plate, SOMAmer Bead Plate lot number, from library prep

    Cloud_Proteomics_Data (Cloud Analysis) or Proteomics_Data (Local Analysis)

    WellPosition

    For each sample, well position (A1-H12)

    Cloud_Settings

    GeneratedVersion

    Software version of the BSSH Run Planner tool that generated the sample sheet

    Cloud_Settings

    Cloud_Proteomics_Pipeline (Cloud Only)

    ICA Path to the proteomics pipeline. For example:

    Cloud_Data

    Sample_ID

    Alphanumeric name up to 100 characters. Letters, numbers, dashes only (or any combination of letters, numbers, and dashes)

    Cloud_Data

    ProjectName

    (optional) user-provided plate-specific project

    Cloud_Data

    LibraryName

    For each sample, must be <Sample_ID>_<index>_<index2>

    Cloud_Data

    LibraryPrepKitName

    Must be "IlluminaProteinPrep9k"

    Cloud_Data

    IndexAdapterKitName

    Must be "IlluminaDNARNAUDISetABCDTagmentation_Proteomics"

    urn:ilmn:ica:pipeline:896aa1df-0412-45db-814e-4683b3215203#DRAGEN_Protein_Quantification_2-2-2

    FAQs

    Process FAQs

    • How can I share data with a collaborator in BSSH?

      • See the following documentation: https://help.connected.illumina.com/basespace-sequence-hub/collaborate/share-with-collaborators

    • I sequenced in manual mode/need to kick of autolaunch after sequencing is complete. How can I do this?

      • Use the BSSH CLI to upload the run folder to BSSH. Make sure the samplesheet is named "SampleSheet.csv" and you are using an up-to-date version of the tool - at least v1.6.1.

      • For additional information, see the following documentation:

    • How do I download/export the samplesheet?

      • Go to the BSSH Run Planner:

        • Go to BSSH, select the intended workgroup on the top right, and then select the Runs tab

        • Additional information on the BSSH Run Planner:

    • How can I requeue a run on BSSH with a new samplesheet/new version of the pipeline?

      • Go to the BSSH Run Planner tool and plan a new run with the desired pipeline.

      • Save the exported samplesheet (see the above step for additional information).

      • Go to the original run, and click Status > Requeue > Planned Run.

    • Can multi-analysis by project split samples on the same plate?

      • No. All samples on the same plate must have the same project.

    Metrics and Bioinformatics FAQs

    • What is DRC_Level and why is this used?

      • DRC stands for Dynamic Range Compression. It is used to even out the SOMAmer concentrations from a 5-log dynamic range to 2-log. Without DRC, the most abundant SOMAmers would occupy the majority of sequencing readout capacity. Each sample would require extremely high sequencing depth to cover the unabundant SOMAmers and accurately quantify them. In using DRC, probes are grouped based on their observed abundances under no DRC condition, and each group is compressed by a different set ratio. The compression level of each SOMAmer is included in the DRC_Level row of SOMAmer metadata.

    • How can I compare the outputs of DRAGEN Protein Quantification to SomaScan Array, and how do their metrics compare?

    Organism
    # of SOMAmers in the 9.5 Assay
    • For a new run:

      • Select the "New Run" button on the right and plan a new run with the desired pipeline.

      • After ingesting the IPPAS output files in the BSSH Run Planner, the user will reach the following Run Review page:

      • Select the "Export" option on the bottom right to download the samplesheet created

      • Select the "Save as Draft" or "Save as Planned" options to save this run as a draft or planned run respectively

    • For an "Active" or "Planned" run:

      • Select the intended tab mentioned under the Runs header.

      • Select the required run from the list of runs:

    • Select "Use a new Sample Sheet".

    • Upload the new sample sheet.

    • On the next page, click "requeue".

  • Unfortunately, it's not possible to directly compare the outputs of these two products due to the differing DRC strategies used.

  • Metrics and Normalization steps with different names between Somalogic's tool and Illumina's DRAGEN Protein Quantification. See the table below for the approximate mapping.

    Somalogic Normalization Steps
    Illumina Protein Prep Normalization Steps
    Notes

    Raw RFU

    Raw

  • What is Dilution and why is it used?

    • Each plate is split into multiple dilution groups, and SOMAmers are added to the plate based on that group. This is necessary because SOMAmers need to be more concentrated than proteins to facilitate binding. SOMAmers on catch0 beads have a physical concentration cap requiring the proteins to be diluted at different levels to achieve the concentration gaps to SOMAmers. The dilution group the SOMAmer is added to can be identified in the SOMAmer metadata.

  • Do the counts in the ADAT represent the original absolute quantification of the sample?

    • No. Counts in the ADAT cover the relative SOMAmer quantification of the sample. Factors like DRC, Dilution, and PCR mean that counts are compressed by differing factors; these factors are not uncompressed in the final ADAT.

  • How should signal and background be compared?

    • To identify background, blank samples on the plate (negative controls) can be used to obtain a per SOMAmer background.

      • Additionally, a global Limit of Detection (LoD) is computed for each SOMAmer for both plasma and serum using internal data. This is in the SOMAmer metadata section of the ADAT.

    • To identify signal, QC samples on the plate (positive controls) can be used to obtain a per SOMAmer signal.

    • When comparing signal to background, it's important to do so on a per SOMAmer level as background can vary from SOMAmer to SOMAmer.

  • Where are the FASTQs?

    • DRAGEN Protein Quantification does not produce FASTQs. By utilizing DRAGEN Counting, the software uses BCL Convert to demultiplex and count proteins per sample at the same time. This reduces analysis time but also removes FASTQ as a pipeline output.

  • What if the PF or Q30 values are lower than expected?

    • First, check secondary metrics. If secondary metrics are passing, it's acceptable to proceed with further analysis. If secondary metrics have warnings or failures, consider re-pooling or repeating the dilute and denature and sequencing of the saved pool. If the data still looks poor, reach out to tech support for further guidance.

  • Why are there counts for blank samples, and non-human SOMAmer counts for human samples?

    • This is the background of the Illumina Protein Prep assay, which is due to non-specific binding of the SOMAmers. During library prep, all samples will experience SOMAMers sticking to beads or the side and bottom of the plate. These SOMAmers will then be brought through to sequencing. This is one of the reasons this is a relative abundance assay - the difference in SOMAmer counts is more important than the absolute value of the counts due to this background.

  • Why are there SOMAmer counts in blank samples?

    • This is the background of the Illumina Protein Prep assay, which is due to non-specific binding of the SOMAmers. During library prep, all samples will experience SOMAMers sticking to beads or the side and bottom of the plate. These SOMAmers will then be brought through to sequencing.

  • There are some proteins that multiple SOMAmers map to. Why is this and how does it impact analysis?

    • There are a handful of SOMAmers that map to the same protein. Some were created as Somalogic improved the SELEX process and a new SOMAmer’s affinity to the protein improved. Some may target different domains, isoforms, or cleavage products of a protein. They may or may not compete for the same epitope. Given the multiple reasons for multiple SOMAmers per protein, Illumina doesn't recommend any general methods for analysis that apply to all proteins.

  • What organisms does the SOMAmer metadata cover in the 9.5k assay?

  • Thermus thermophilus

    3

    European elder

    2

    Common eastern firefly

    2

    E. coli

    1

    HIV-1

    1

    HIV-2

    1

    Bacillus stearothermophilus

    1

    Ensifer meliloti

    1

    Red alga

    1

    Human

    10326

    Mouse

    230

    Gila monster

    3

    Hornet

    3

    Jellyfish

    3

    African clawed frog

    3

    https://help.connected.illumina.com/basespace-sequence-hub/cmd-line-interfaces/basespace-cli
    https://help.basespace.illumina.com/sequence/plan-runs

    ADAT Content - 6k

    This page describes the output from the 6k Illumina Protein Prep assay, which differs slightly from the 9.5k assay. Do not reference this page unless your analysis was created on DRAGEN Protein Quantification v1.8 or earlier.

    The primary output file of DRAGEN Protein Quantification is the ADAT. One ADAT is produced for each normalization step performed. The final ADAT produced (<output_file_prefix>_Step7_FinalNormStep_MedNormExt.adat) has the normalized counts at the end of the full normalization process.

    There are four key components to the ADAT: the header, SOMAmer metadata, sample metadata, and counts.

    ADAT Header

    Next, click File > Download > Sample Sheet:

    Hyb Normalization

    HybNorm

    medNormInt

    MedNormInt

    plateScale

    PlatformSpecificPlateScale

    Calibration

    PlatformSpecificCalibrate

    -

    CrossPlatformPlateScale

    Applied to normalize data across instruments

    -

    CrossPlatformCalibrate

    Applied to normalize data across instruments

    -

    MedNormExt

    anmlQC

    -

    qcCheck

    QCCheck

    anmlSMP

    -

    Filtered

    -

    Metric
    Description

    Version

    Version of the software used during analysis

    Title

    User reference for the anlaysis

    OutputDirectory

    Not applicable

    SOMAmerReferenceSource

    SOMAmer metadata version used during analysis

    AssayType

    Assay types used

    SiteId

    Not applicable

    AssayVersion

    Version of the Illumina Protein Prep assay (for 6k product, value will be Illumina Protein Kit 1.0)

    AssayRobot

    Liquid handling robot

    Sample Metadata

    Metric
    Description

    SampleID

    Sample identifier

    PlateId

    Unique plate identifier

    InputType

    Sample type specified in manifest file.

    Examples: Plasma_Calibrator, Plasma_QC, Plasma, Serum_Calibrator, Serum_ QC, Serum, Blank

    MatrixTube

    Matrix used.

    Examples: Plasma, Serum

    SampleType

    Type of sample processed.

    Examples: Plasma, Serum, Blank, QC, Calibrator

    CalibratorId

    Calibrator lot/ID.

    SOMAmer Metadata

    Metric
    Description

    SeqId

    Unique sequence identifier for the SOMAmer

    SeqIdVersion

    SOMAmer sequence version

    SomaId

    Somalogic-provided identifier

    Target

    Protein target identifier

    Target Full Name

    Protein target full name

    Type

    Target type (protein)

    ADAT Layout

    Illumina Connected Multiomics Walkthrough

    Illumina Connected Multiomics provides interactive visualizations and powerful statistics. This is a walkthrough of an analysis that could be done in Connected Multiomics with an example proteomic data set, produced by DRAGEN Protein Quantification. It covers the following features:

    • Creating a default analysis

    • Creating a custom analysis

    • Create a feature list to filter by

    ADAT Content - 9.5k

    This page describes the output from the 9.5k Illumina Protein Prep assay, which differs slightly from the 6k assay. Use this for ADATs made on DRAGEN Protein Quantification v2.0 and higher.

    The primary output file of DRAGEN Protein Quantification is the ADAT. One ADAT is produced for each normalization step performed. The final ADAT produced (<output_file_prefix>_Step7_FinalNormStep_MedNormExt.adat) has the normalized counts at the end of the full normalization process.

    There are four key components to the ADAT: the header, SOMAmer metadata, sample metadata, and counts.

    ADAT Header

    RunId

    Unique identifier for the run (created by sequencing instrument)

    Instrument Type

    Instrument used for sequencing

    NGSLot

    Internal reference for control samples

    Flowcell

    Flowcell used for sequencing

    YieldDemux

    Total number of reads that are demultiplexed

    YieldQ30Demux

    Total number of reads that are demultiplexed with a passing QC score

    Q30WeightedMean

    Q30 primary sequencing metric, comuted as the Q30 weighted mean across lanes

    CreatedBy

    Not applicable

    EnteredBy

    Not applicable

    ExpDate

    Not applicable

    CreatedDate

    Date the run was performed

    Notes

    Not applicable

    StudyOrganism

    Sample organism (human)

    StudyMatrix

    Matrix used in study (serum or plasma)

    CalibratorId

    Lot number(s) of the calibrator samples

    ReportConfig

    Configuration used for normalization

    ProcessSteps

    The process steps that occurred to produced the counts in the current ADAT

    PlatformSpecificPlateScale_ScaleFactor

    Scale factors for the PlateScale normalization step, when compared to a platform specific reference. The reference is based on data from the sequencing instrumented used for the run.

    PlatformSpecificCalibrateTailPercent

    Percent of Platform Specific Calibration scale factors in tails (outside of the acceptable range of 0.6-1.4) for each plate when compared to a reference. The platform specific reference is based on data from the sequencing instrument used for the run.

    CrossPlatformPlateScale_ScaleFactor

    Scale factors for the PlateScale normalization step, when compared to a universal reference made with NovaSeq X data.

    CrossPlatformCalibrateTailPercent

    Percent of Cross Platform Calibration scale factors in tails (outside of the acceptable range of 0.6-1.4) for each plate when compared to a reference. The cross platform reference is based on data from NovaSeq X runs.

    QCCheckTailPercent

    Percent of QC scale factors in tails for each plate.

    QCCheckTailPercent_PassFlag

    PASS/FAIL for each plate. If QCCheckTailPercent for a plate is greater than .15, this value shall be "FAIL". If it's less than or equal to .15, it shall be "PASS".

    GeneratedBy

    Version of SomaData parser used to write ADAT (ex. SomaData_1.0.0)

    WellPosition

    Location of the sample on the 96 well plate (A1-H12)

    SOMAmer Reads

    Number of raw counts human SOMAmer reads (excluding control reads)

    SOMAmerReads_PassFlag

    This flag indicates whether a non-blank sample meets specifications for number of SOMAmer reads. For non-blank samples:

    • SOMAmerReads ≥ 10 million = PASS

    • SOMAmerReads < 10 million = FLAG

    Not applied to blank samples.

    HybNorm_1_ScaleFactor

    The hybridization control scale factor for hyb plate 1

    HybNorm_2_ScaleFactor

    The hybridization control scale factor for hyb plate 2

    HybNorm_PassFlag

    This flag indicates whether the HybNorm_1_ScaleFactor is within the specified acceptance criteria range of 0.4–2.5.

    MedNormInt_5e-05_ScaleFactor

    The MedNormInt scale factor for the 0.005% dilution group

    MedNormInt_0.005_ScaleFactor

    The MedNormInt scale factor for the 0.5% dilution group

    MedNormInt_0.2_ScaleFactor

    The MedNormInt scale factor for the 20% dilution group

    MedNormInt_PassFlag

    This flag indicates whether all dilution group scale factors are within the specified acceptance criteria range of 0.4–2.5 for the MedNormInt step. Applies to blank and calibrator samples.

    MedNormExt_5e-05_ScaleFactor

    The MedNormExt scale factor for the 0.005% dilution group

    MedNormExt_0.005_ScaleFactor

    The MedNormExt scale factor for the 0.5% dilution group

    MedNormExt_0.2_ScaleFactor

    The MedNormExt scale factor for the 20% dilution group

    MedNormExt_PassFlag

    This flag indicates whether all dilution group scale factors are within the specified acceptance criteria range of 0.4–2.5 for the MedNormExt step. Applied to QC and serum/plasma samples.

    ANML_5e-05_ScaleFactor

    The ANML scale factor for the 0.005% dilution group. Note: ANML norm is not output.

    ANML_0.005_ScaleFactor

    The ANML scale factor for the 0.5% dilution group. Note: ANML norm is not output.

    ANML_0.2_ScaleFactor

    The ANML scale factor for the 20% dilution group. Note: ANML norm is not output.

    ANML_5e-05_fraction_used

    Fraction of probes in 0.005% dilution group used to compute corresponding ANML scale factor.

    ANML_0.005_fraction_used

    Fraction of probes in 0.5% dilution group used to compute corresponding ANML scale factor.

    ANML_0.2_fraction_used

    Fraction of probes in 20% dilution group used to compute corresponding ANML scale factor.

    RowCheck_PassFlag

    This flag indicates whether all row scale factors are within the specified acceptance criteria range.

    PlateRunDate

    Not applicable

    SampleNotes

    Not applicable

    Barcode

    Not applicable

    UniProt

    UniProt identifier

    EntrezGeneID

    Entrez Gene identifier

    EntrezGeneSymbol

    Entrez Gene symbol

    HybControl

    True if the SOMAmer is a hyb control, else False

    MedNormControl

    True if the SOMAmer is used during MedNormExt normalization, else False

    LoD.Plasma

    Limit of detection for plasma SOMAmers

    LoD.Serum

    Limit of detection for serum SOMAmers

    Organism

    Organism (e.g. Human, Mouse) of the SOMAmer

    PlatformSpecificCalibrate_<PlateId>_ScaleFactor

    Scale Factor for Platform Specific Calibration

    PlatformSpecificCalibrate_<PlateId>_PassFlag

    Flag to indicates whether the PlatformSpecificCalibrate scale factor for this SeqId was in the acceptance criteria range of 0.6-1.4. Note: this flag is only used for evaluating the PlatformSpecificCalibrateTailPercent and should not be used to exclude SOMAmers from analysis.

    CrossPlatformCalibrate_<PlateId>_ScaleFactor

    Scale Factor for Cross Platform Calibration

    CrossPlatformCalibrate_<PlateId>_PassFlag

    Flag to indicates whether the CrossPlatformCalibrate scale factor for this SeqId was in the acceptance criteria range of 0.6-1.4. Note: this flag is only used for evaluating the CrossPlatformCalibrateTailPercent and should not be used to exclude SOMAmers from analysis.

    QCCheck_<PlateId>_ScaleFactor

    Scale Factor for QC Check

    QCCheck_<PlateId>_PassFlag

    Flag to indicates whether the QCCheck scale factor for this SeqId was in the acceptance criteria range of 0.8-1.2. Note: this flag is only used for evaluating the QCCheckTailPercent and should not be used to exclude SOMAmers from analysis.

    ColCheck_<PlateId>_PassFlag

    This flag indicates whether all SOMAmer scale factors are within the specified acceptance criteria range. Note: this flag should not be used to exclude SOMAmers from analysis.

    Dilution

    Dilution group classification for the SOMAmer

    DRC

    Compression of SOMAmer range that occurred during the assay.

    HCG

    Hybridization Control Group. Identifies the hybridization plate of the SOMAmer's NGS reporter.

    BlackList

    Identifies SOMAmers with low performing NGS reporters in product testing. SOMAmers with the value TRUE should be excluded from analysis.

    References (Ref.*)

    References available to the SW version used for analysis.

    Units

    Units of matrix content

    IE, 9.5k human protein list

  • Managing sample metadata

  • Filtering samples

  • Filtering features

  • Data Transformation

  • PCA

  • Differential expression

  • Hierarchical clustering and creating heatmaps

  • Gene set enrichment analysis

  • For information on the Connected Multiomics Platform, including how to log in, please reference the following documentation: https://help.multiomics.illumina.com/icm

    Demo Data

    Demo data that can be used to follow along with this walkthrough is found in the Connected Multiomics Demo Data repository. To add this dataset to a study, perform the following steps:

    After clicking "+ Add Demo Data", the data used in this walkthrough can be found at /Multiomics-Demo-Data/Proteomics/NovaSeq 6k-S4 Cancer-Normal. For this study, both SampleType (CRC/Control) and TimePoint (T1...T8) are used. This data must be ingested prior to starting the analysis. Add both the ADAT (counts) and TSV (metadata) to the study.

    Creating a Default Analysis

    • Click on '+ New Analysis'.

    • In the pop-up window, provide a name for the analysis, select ‘Default Analysis’ as the Analysis Type, choose the sample group to be included in the analysis ('All ADAT Samples' will be selected by default), and click on the ‘Run Analysis’ button.

    • Exploring the PCA plot:

      • By default, the plot is colored by BatchID. To change this, in the left hand bar, select Configure > Style > Color by.

      • You may also want to explore other principal components. To do this, on the left side bar for the PCA plot, choose configure, axes, and update the data for each axis.

    Creating a Custom Analysis

    Custom Analysis Example
    • Click on ‘+ New Analysis’.

    • In the pop-up window, provide a name for the analysis, select ‘Custom Analysis’ as the Analysis Type, choose the sample group to be included in the analysis ('All ADAT Samples' will be selected by default), and click on the ‘Run Analysis’ button.

    ​​

    • Note: make sure there are no duplicated Sample IDs in the analysis groups.

    • A pop-up message will show up if the analysis creation is successful.

    ​

    • Refresh the page to get the latest status of the analysis.

    • When the Status is ‘Complete’, click on the analysis tile to enter the analysis module.

    ​

    • There is no default initiated analysis for the custom proteomic data. To review the number of samples and features, hover over the data node.

    • Throughout the below analyses, rectangles/task nodes will produce circles/data nodes. Double clicking on the task node will describe the task as it occurred, and double clicking on the data node will take you to the results of the analysis. Nodes will be greyed out while the analysis is still in progress.

    Create a List of Features for Filtering

    The SOMAmer content by defaults includes all whitelisted SOMAmers counted during secondary analysis, including controls and non-human proteins. You may want to exclude some SOMAmers from tertiary analysis. One way to do this is to create a saved list of proteins.

    • Click on the setting icon on the top right corner of the analyses dashboard and click on 'Settings' from the menu.

    • On the settings page, click on 'Lists' from the left hand navigation bar, and then click on '+ New list' on the top of the right panel to add new list.

    • On the 'Local file' tab, click on '+ Choose' to select the local file and enter the name of the list in the 'Name' box; click on 'Add list' button to upload the list.

    • During the following analysis, the attached 9.5k human protein list is used. It was generated by filtering SOMAmers by Organism = Human (or only the SOMAmers associated with human proteins) and isolating the SeqIDs. This is recommended for all 9.5k product analyses.

    • If using an Illumina Protein Prep 6k dataset, it's recommended to use the 6k human protein list. It was generated by filtering SOMAmers by Organism = Human (or only the SOMAmers associated with human proteins) and isolating the SeqIDs.

    Managing sample metadata

    • Click on 'Metadata' tab to view and add sample metadata.

    • Click on 'Manage' under 'Sample attributes' to reorder the metadata. Drag 'SampleStatus' and 'TimePoint' boxes to the front since they are the features that need to be colored for the downstream analysis. You can also add/remove/reorder other metadata or add new category to the current metadata in this page.

    Filtering samples

    • Return to the analysis page, click on the 'Quantification' node, choose 'Filtering' > 'Filter samples' from the right hand tool box.

    • Select the samples with TimePoint T1 and T2.

    Filtering features

    • Click on 'Finish' and return to the analysis page. Click on the 'TimePoint in T1,T2' node, choose 'Filtering' > 'Filter features' from the right hand tool box.

    • If enabled earlier, select 'Saved list' option, choose the previously uploaded "9.5k human protein list - SeqIDs" from the dropdown, and make sure the "Feature identifier" is set to Feature ID; click 'Finish'. Alternatively, upload a manual list of features.

    Data transformation

    • Click on the 'Filtered counts' node, choose 'Normalization and Scaling' > 'Normalization' from the right hand tool box.

    • Choose 'Add' and drag it to the right-hand box to avoid 0 counts. This prevents any 0 count values which could impact Limma-trend differential analysis, which assumes continuous data. Then click on 'Finish' to return to the analysis dashboard.

    PCA

    • Click on the 'Normalized counts' node, select 'Exploratory analysis' > 'PCA' from the right hand tool box.

    • Use the default setting and click 'Finish'.

    • Double click on the 'PCA' node to view the PCA report.

      • The scatter plot shows the data distribution (colored by SampleStatus) among the first three PCs.

      • The scree plot (top right panel) shows the variant represented by each PC.

      • The component loading table (bottom right panel) shows the correlation between every protein/SOMAmer and each PC. The variable in this table represents the SOMAmer's SeqID.

      • For additional information on PCA, review the following documentation:

    Differential expression

    • Click on the 'Normalized counts' node, select 'Statistics' > 'Differential analysis' from the right hand menu.

    • Select 'Limma-trend' (default) method and click 'Next'.

    NOTE: Limma-trend is a robust model that fits the assumptions for small sample sizes of normalized protein counts. The Limma-trend model is also flexible with categorical and quantitative variables. For other datasets or experimental designs, consider other methods.

    • Select 'SampleStatus', 'TimePoint', 'DonorID' then click 'Add factors'. Select 'SampleStatus' and 'TimePoint' then click on 'Add interaction' to add the factors. Click on 'Next' to set up comparisons.

    • Drag 'CRC' to the top right box and 'Control' to the bottom right box. Click on 'Add comparison'. Then Select 'SampleStatus*TimePoint' from the Factor dropdown menu. Add T1 and T2 comparison between CRC and Control. Keep "Combine" selected for each of these comparisons. Click on 'Finish' bottom at the bottom.

    • Double click on the 'Limma-trend' node to view the report. On the left hand menu,

      • select 'FDR', choose 'Per contrast' and specify 0.05 for CRC vs Control comparison

      • select 'Fold change', choose 'Per contranst' and specify -2 to 2 for CRC vs Control comparison

      • click on 'Generate Filtered Node'

      • repeat this process on CRC T1 vs Control T1 comparison and CRC T2 vs Control T2 comparison

    • Return to the analyses dashboard and there will be 3 filtered feature list nodes added to the pipeline; right click on the 'Filtered feature list' node and click on 'Rename data node' to rename the node as 'T vs N'; apply the same procedure to the other two filtered feature lists and rename them as 'T vs N Time 1' and 'T vs N Time 2' respectively.

    • To compare the filtered feature lists, click on the 'Venn diagram' on the bottom menu and tick on the filtered lists ('T vs N', 'T vs N Time 1' and 'T vs N Time 2'); then click on 'Display selection' button on the bottom to visualize the Venn diagram.

    Hierarchical clustering and creating heatmaps

    • Click on 'T vs N' data node, select 'Exploratory analysis' > 'Hierarchical clustering / heatmap' from the right hand tool box.

    • Choose 'Heatmap' and select the feature order and sample order.

      • Choose 'Cluster' (default) as feature order

      • Choose 'Assign order' and select 'SampleStatus' from the dropdown menu.

      • Click on 'Finish' button at the bottom of the page.

    • Double click on the 'Hierarchical clustering / heatmap' node to view it.

    • For additional information on hierarchical clustering, view the following documentation: https://help.partek.illumina.com/partek-flow/user-manual/task-menu/exploratory-analysis/hierarchical-clustering

    Gene set enrichment

    • Click on 'T vs N' data node, select 'Biological interpretation' > 'Gene set enrichment' from the right hand tool box.

    • Select 'KEGG database', and specify the background gene list as the previously uploaded list of gene symbols. Specifying this list ensures that only genes with associated SOMAmers are included in this analysis. Click on 'Finish' button at the bottom of the page.

    • Double click on the 'Pathway enrichment' node to view the enriched pathways.

      • Click on the pathway name to view the pathway network

      • To download genes in each pathway, click on the value in 'Genes in set' column in the corresponding pathway entry.

    GSEA

    • To detect differential pathways between diseased and control samples, click on the 'Normalized counts' node and select 'Biological interpretation' > 'GSEA' from the right hand tool box.

    • Select 'KEGG database' (default) and click on 'Next' button at the bottom of the page.

    • Select 'SampleStatus' and click on 'Next'.

    • Drag 'CRC' to the top right box and 'Control' to the bottom right box. Keep "Combine" selected for this comparison. Click on 'Add comparison', and then click 'Finish' on the bottom of the page

    • Double click on the 'GSEA' node to view the results.

    • Click on the enrichment plot icon after each row index to visualize the enrichment score of the corresponding pathway.

    NOTE. For clarify on the differences between Gene Set Enrichment Analysis and GSEA, please view this documentation: https://help.partek.illumina.com/partek-flow/frequently-asked-questions#what-is-the-difference-between-gsea-and-gene-set-enrichment

    98KB
    SeqID_Human_Protein_9.5k.txt
    Open
    63KB
    SeqID_Human_Protein_6k.txt
    Open
    Metric
    Description

    Version

    Version of the software used during analysis

    Title

    User reference for the anlaysis

    SOMAmerReferenceSource

    SOMAmer metadata version used during analysis

    AssayType

    Assay types used

    AssayVersion

    Version of the Illumina Protein Prep assay (for 9.5k product, value will be Illumina Protein Prep 9k)

    AssayRobot

    Liquid handling robot

    RunId

    Unique identifier for the run (created by sequencing instrument)

    Instrument Type

    Instrument used for sequencing

    Sample Metadata

    Metric
    Description

    SampleID

    Sample identifier

    PlateId

    Unique plate identifier

    MatrixTubeBarcode

    Matrix tube barcode scanned during library prep

    BatchID

    User-provided batch identifier

    InputType

    Sample type specified in manifest file.

    Examples: Plasma_Calibrator, Plasma_QC, Plasma, Serum_Calibrator, Serum_ QC, Serum, Blank

    MatrixTube

    Matrix used.

    Examples: Plasma, Serum

    SOMAmer Metadata

    Metric
    Description

    SeqId

    Unique sequence identifier for the SOMAmer

    SeqIdVersion

    SOMAmer sequence version

    SomaId

    Somalogic-provided identifier

    Target

    Protein target identifier

    Target Full Name

    Protein target full name

    Type

    Target type (protein)

    ADAT Layout

    Flowcell

    Flowcell used for sequencing

    YieldDemux

    Total number of reads that are demultiplexed

    YieldQ30Demux

    Total number of reads that are demultiplexed with a passing QC score

    Q30WeightedMean

    Q30 primary sequencing metric, comuted as the Q30 weighted mean across lanes

    CreatedDate

    Date the run was performed

    StudyOrganism

    Sample organism (human)

    StudyMatrix

    Matrix used in study (serum or plasma)

    CalibratorId

    Lot number(s) of the calibrator samples

    ProcessSteps

    The process steps that occurred to produced the counts in the current ADAT

    PlateSOMAmerNormReads_PassFlag

    PASS/FLAG for each plate. If greater than 70% of Blank Samples have SOMAmerNormReads_PassFlag = FLAG, PlateSOMAmerNormReads_PassFlag shall be WARNING. Otherwise, it shall have the value "PASS".

    PlatformSpecificPlateScale_ScaleFactor

    Scale factors for the PlateScale normalization step, when compared to a platform specific reference. The reference is based on data from the sequencing instrumented used for the run.

    PlatformSpecificCalibrateTailPercent

    Percent of Platform Specific Calibration scale factors in tails (outside of the acceptable range of 0.6-1.4) for each plate when compared to a reference. The platform specific reference is based on data from the sequencing instrument used for the run.

    PlatformSpecificCalibrateTailPercent_PassFlag

    PASS/WARNING for each plate. If PlatformSpecificCalibrateTailPercent for a plate is greater than .15, this value shall be "WARNING". If it's less than or equal to .15, it shall be "PASS".

    CrossPlatformPlateScale_ScaleFactor

    Scale factors for the PlateScale normalization step, when compared to a universal reference made with NovaSeq X data.

    CrossPlatformCalibrateTailPercent

    Percent of Cross Platform Calibration scale factors in tails (outside of the acceptable range of 0.6-1.4) for each plate when compared to a reference. The cross platform reference is based on data from NovaSeq X runs.

    QCCheckTailPercent

    Percent of QC scale factors in tails for each plate.

    QCCheckTailPercent_PassFlag

    PASS/FAIL for each plate. If QCCheckTailPercent for a plate is greater than .15, this value shall be "FAIL". If it's less than or equal to .15, it shall be "PASS".

    IntraPlateMedianCV

    Median Percent CV metric for each plate, providing:

    • (sample Standard Deviation of calibrator samples/Mean of calibrator samples)*100

    • (sample Standard Deviation of QC samples/Mean of QC samples)*100

    IntraPlateQ75CV

    Same as IntraPlateMedianCV (above) but showing 75th percentile of Percent CV instead of median

    IntraPlateQ90CV

    Same as IntraPlateMedianCV (above) but showing 90th percentile of Percent CV instead of median

    InterPlateMedianCV

    Median Percent CV metric across plates, providing:

    • (sample Standard Deviation of plate mean count in calibrator samples/Mean of plate mean count in calibrator samples)*100

    • (sample Standard Deviation of plate mean count in QC samples/Mean of plate mean count QC samples)*100

    Provided only if there are 4 or more plates in the analysis

    InterPlateQ75CV

    Same as InterPlateMedianCV (above) but showing 75th percentile of Percent CV instead of median

    InterPlateQ90CV

    Same as InterPlateMedianCV (above) but showing 90th percentile of Percent CV instead of median

    MedianS2B

    Median Signal to Background metric for each plate, providing:

    • Median Calibrator/Median Blank

    • Median QC/Median Blank

    GeneratedBy

    Version of SomaData parser used to write ADAT (ex. SomaData_1.0.0)

    SampleType

    Type of sample processed.

    Examples: Plasma, Serum, Blank, QC, Calibrator

    ControlID

    ID of the calibrator, QC, or blank lot (applied to controls only).

    ProbePlate

    Probe plate lot number, from library prep

    SOMAmerBeadPlate

    SOMAmer bead plate lot number. Last two digits indicate the master mix lot number.

    WellPosition

    Location of the sample on the 96 well plate (A1-H12)

    Project

    (Optional) User-provided project identifier

    SOMAmerReads

    Number of raw counts human SOMAmer reads (excluding control reads)

    SOMAmerReads_PassFlag

    This flag indicates whether a non-blank sample meets specifications for number of SOMAmer reads. For non-blank samples:

    • SOMAmerReads ≥ 10 million = PASS

    • SOMAmerReads < 10 million = FLAG

    Not applied to blank samples.

    SOMAmerNormReads

    Number of normalized counts human SOMAmer reads at plate scale normalization step (excluding control reads)

    SOMAmerNormReads_PassFlag

    This flag indicates whether a blank sample meets the specifications for number of normalized SOMAmer reads. For blank samples:

    • SOMAmerNormReads ≤ 40 million = PASS

    • SOMAmerNormReads > 40 million = FLAG

    Not applied to non-blank samples.

    RefCorr

    Spearman correlation value for comparing samples to the plasma or serum reference.

    EmpericalHybTemp

    This indicates the actual hybridization temperature of a sample, obtained empirically from a set of 78 temperature controls.

    EmpiricalHybTemp_PassFlag

    This flag indicates if a samples meets the specification for hybridization temperate.

    • EmpiricalHybTemp < 54.2 = PASS

    • EmpiricalHybTemp > 54.2 = FLAG

    HybNorm_1_ScaleFactor

    The hybridization control scale factor

    HybNorm_PassFlag

    This flag indicates whether the HybNorm_1_ScaleFactor is within the specified acceptance criteria range of 0.4–2.5.

    MedNormInt_5e-05_ScaleFactor

    The MedNormInt scale factor for the 0.005% dilution group

    MedNormInt_0.005_ScaleFactor

    The MedNormInt scale factor for the 0.5% dilution group

    MedNormInt_0.2_ScaleFactor

    The MedNormInt scale factor for the 20% dilution group

    MedNormInt_PassFlag

    This flag indicates whether all dilution group scale factors are within the specified acceptance criteria range of 0.4–2.5 for the MedNormInt step. Applies to blank and calibrator samples.

    MedNormExt_5e-05_ScaleFactor

    The MedNormExt scale factor for the 0.005% dilution group

    MedNormExt_0.005_ScaleFactor

    The MedNormExt scale factor for the 0.5% dilution group

    MedNormExt_0.2_ScaleFactor

    The MedNormExt scale factor for the 20% dilution group

    MedNormExt_PassFlag

    This flag indicates whether all dilution group scale factors are within the specified acceptance criteria range of 0.4–2.5 for the MedNormExt step. Applied to QC and serum/plasma samples.

    RowCheck_PassFlag

    This flag indicates whether all row scale factors are within the specified acceptance criteria range.

    UniProt ID

    UniProt identifier

    Entrez Gene ID

    Entrez Gene identifier

    Entrez Gene Symbol

    Entrez Gene symbol

    HybControl

    True is the SOMAmer is a hyb control, else False

    LoD.Plasma

    Limit of detection for plasma SOMAmers

    LoD.Serum

    Limit of detection for serum SOMAmers

    Organism

    Organism (e.g. Human, Mouse) of the SOMAmer

    PlatformSpecificCalibrate_<PlateId>_ScaleFactor

    Scale Factor for Platform Specific Calibration

    CrossPlatformCalibrate_<PlateId>_ScaleFactor

    Scale Factor for Cross Platform Calibration

    QCCheck_<PlateId>_ScaleFactor

    Scale Factor for QC Check

    Dilution

    Dilution group classification for the SOMAmer

    DRC_Level

    Compression of SOMAmer range that occurred during the assay.

    References

    References used in the analysis.

    Units

    Units of matrix content

    https://help.partek.illumina.com/partek-flow/user-manual/task-menu/exploratory-analysis/pca