1 of 5

Pre-analysis tools

This section constitutes tools that are useful in preparing single cell data for downstream analysis, such as multi-sample comparison or multiomics analysis. To invoke Pre-analysis tools, click on any Single cell counts data node. These include the following tasks:

Pseudobulk
Split by feature type

Pseudobulk

Pseudobulk task combines expression values from all cells of a particular cell type classification for each sample. In essence, it creates virtual bulk level data from single cell level data. Because it is virtual bulk data, all the same tasks that can be performed on bulk level gene counts data can be performed on the output of Pseudobulk task.

Pseudobulk makes it easy to compare gene or protein expression for a cell type of interest between experimental groups.

Before running Pseudobulk , you must classify the cells. To run Pseudobulk , select the data node with your classified cells and invoke this task.

Select how you would like to group the cells, by default sample name is selected, this option allows you to pool all the cells in a sample to generate sample level expression. You can add other attribute from the drop-down list, e.g. cell type, then it will pool cells in each cell type within one sample as sample level expression on all the features.

Agrreation method option are Sum, Maximum, Mean, and Median. Expression values for cells from the same sample with the same cell type classification will be merged using the chosen pooling method. If the input data node contains raw count, Sum is recommended; if the input data node contains normalized count, Mean or Median will make more sense

After clicking Finish, a Pseudobulk data node will be generated which contains bulk level expression data.

Split by feature type

Split matrix can be invoked on any counts data node with more than one feature types (multiomics data). For example, a CITE-Seq experiment would have Gene Expression counts and Antibody Capture counts in the single cell counts data node. This task to split different feature measurements for downstream analysis.

There are no parameters to configure, to run:

Click the counts data node you want to split
Click the Pre-analysis tools section of the toolbox
Click Split by feature type

The Split matrix task will run and generate output data nodes for each of the feature types. For example, if there are Antibody Capture and Gene Expression feature types in the input, Split matrix will generate two data nodes. Every sample is included in both matrices.

Generate group cell counts

If a single cell data node contains cell attribute information, e.g., clustering results, classifications, or imported attributes, a counts-type data node containing the number of cells from each attribute group for each sample can be generated and used for downstream analysis.

To invoke Generate group cell counts:

Click a single cell count data node with cell-level attribute information
Click Pre-analysis tools in the toolbox
Click Generate group cell counts
Select the attribute to group the cells from the Group by drop-down menu and click + button

Click Finish

A group cell counts node will be generated. The data node contains a matrix of cell counts in each sample for each group. You can view the counts results in the Group cell counts report.

The Group cell counts data node is a counts type data node and downstream analysis tasks, such as normalization, PCA, and ANOVA, can be used to analyze the group cell counts data.

Merge matrices

In complex projects, different data matrices (e.g. observations on rows and features on columns) need to be merged in order to achieve the analysis goals. For example, two cell populations were identified on separate branches of the analysis pipeline and to combine them before any joint downstream steps, the expression matrices have to be combined. Alternatively, two assays (gene expression and protein expression) were performed on the same cells so the expression matrices have to be merged for joint analysis.

Merge matrices task is located in the Pre-analysis tools section of the toolbox and it can handle two scenarios: Merge cells/samples and Merge features. To start, select the first data node on the pipeline (e.g. single cell counts) and then select the Merge matrices task.

Merge Cells/Samples

To use the Merge cells option, the data matrices (one or more) that are to be merged with the currently selected one should have the same features (e.g. genes), but distinct cells. Push the Select data nodes button to display a preview of the pipeline; the data nodes that can be merged are shown in color of the branch, other data nodes are disabled (greyed out). Left click on the data node that you want to merge with the current one and click the Select button, you can select multiple data nodes to merge. The selected node(s) will be shown under the Select data nodes button. If you made a mistake, use the Clear selection icon. Push Finish to proceed.

Merge Features

To use the Merge features option, the data matrices (one or more) that are to be merged with the currently selected one should have the same cells (or samples), but distinct features (e.g. gene and protein expression). Push the Select data nodes button to display a preview of the pipeline; the data nodes that can be merged are shown in color of the branch, others are disabled (greyed out). Left lick on the data node that you want to merge with the current one and push the Select button. The selected node will be shown under the Select data nodes button. Repeat the procedure if you would like to merge additional nodes. If you made a mistake, use the Clear selection icon. Push Finish to proceed.

The output of the Merge matrices task is a Merged counts data node.