# Transcript Counting

Within each barcode and gene combination, IMIs are grouped in one of 64 bins, based on the 3-base binning index. For each bin, all identical IMIs are collapsed into a single count, since they are likely PCR duplicates of the same fragment generated during library prep.

Any barcode and gene combination that has ten or fewer unique binning indexes is assigned the number of unique binning indexes as its final count estimate. The pipeline then totals the number of IMIs associated with each remaining barcode and gene combination, and divides that number by the IPM correction factor, which accounts for the additional copies generated from a single captured molecule during five amplification cycles. The final count is the maximum between the floor of this value and the number of unique binning indexes for this barcode and gene.

Because all IMIs from the same parent molecule share a binning index, the number of unique binning indexes observed within a specific barcode and gene is determined by the number of molecules and is not impacted by the number of IMIs that were produced by the molecules. This means that the probabilistic relationship between the number of unique bins and the true number of molecules in a barcode and gene combination is constant and is the result of random sampling from the 64 possible bin indexes when each molecule is captured. For the subset of barcode and gene combinations with between 5 and 32 unique bin indexes, dividing the total number of IMIs by the average number of molecules expected based on the number of unique bin indexes gives you the estimated average IMIs per molecule (IPM).

The estimated molecular count for a barcode and gene is the total number of IMIs divided by the IPM, rounded down. The more true molecules a barcode and gene combination has, the true average IMIs per molecule should approach the average IPM of the sample. For barcode and gene combinations with very few molecules, the number of unique bins is expected to be a better predictor of the molecular count than the number of IMIs because the variance in the true IMIs per molecule among this group is high since the number of molecules in each individual barcode and gene combination is low. For this reason, IPM correction is applied for barcode and gene combinations with more than 10 unique bin indexes, and otherwise the corrected count is equal to the number of unique bin indexes.

<figure><img src="https://2107948471-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqVEYIKB8JFfdScsTocFN%2Fuploads%2FAM6AlvXRWVBMOhpjsuP7%2Fdragen_error-correction.avif?alt=media&#x26;token=9a27a621-cf40-4632-b47e-b452b5bee435" alt=""><figcaption></figcaption></figure>

For more information about transcript counting refer to the [DRAGEN documentation](https://help.dragen.illumina.com/product-guide/dragen-v4.4/dragen-single-cell-pipeline/dragen-scrna-pipseq#estimating-the-correction-factor) detailing the DRAGEN PIPseq scRNA Pipeline.
