# Third-party CRISPR Guide Assignment Toolkit

[Crispat](https://academic.oup.com/bioinformatics/article/40/9/btae535/7750392) is a peer-reviewed CRISPR guide calling package that conveniently bundles many popular gRNA thresholding approaches, enabling users to compare the performance of multiple approaches side-by-side.&#x20;

Here, we implement one of these approaches, the Gaussian-Gaussian Mixture Model, which we've found performs best with the Illumina's Single-Cell CRISPR Prep kit.&#x20;

[Instructions for how to download and install](https://github.com/aamayzhang/crispat_iscp/tree/main) our optimized fork of crispat can be found[ here](https://github.com/aamayzhang/crispat_iscp/tree/main).

In brief, the model does the following:

* Read in DRAGEN filtered matrix files from an DRAGEN output directory.
* Creates and reads in a .h5ad file (via scanpy).
* Performs 2-component Gaussian-Gaussian mixture modeling on the gRNA counts.
* Outputs gRNA assignments for each cell barcode along with plots showing the fit of the model to the raw counts.

A [step-by-step tutorial](https://github.com/aamayzhang/crispat_iscp/blob/main/tutorial/iscp_demo.py) and a [small test dataset](https://github.com/aamayzhang/crispat_iscp/tree/main/example_data) can be found in the repo.

You should expect the plots of the models fit to nonzero count data to look similar to this:&#x20;

<figure><img src="https://2107948471-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqVEYIKB8JFfdScsTocFN%2Fuploads%2F14YCTnDkfgdwwENpTJUM%2Fgaussian%20mixture%20model.png?alt=media&#x26;token=60f4bdf1-4077-4ecd-9ed0-a1a82b26ff85" alt=""><figcaption></figcaption></figure>

The output file *assignments.csv* provides a full list of all cell barcodes containing a guide that exceeded the threshold determined by the gaussian mixture model.  The outputs should look as follows:

<div align="left"><figure><img src="https://2107948471-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqVEYIKB8JFfdScsTocFN%2Fuploads%2Fe8DGfzFYAO2RVzoqcmjg%2Fassignments%20csv.png?alt=media&#x26;token=3f4be8d0-c58f-4250-8aaf-90ef6289abf6" alt=""><figcaption></figcaption></figure></div>

`cell`:  the cell barcode sequence.

`gRNA` : the gRNA ID that was assigned to the cell.

`read_counts`:  the number of read counts observed for each cell-gRNA assignment.
