Integrating scRNA-Seq and scATAC-Seq Data: A Primer
Single-cell sequencing technologies have revolutionized our understanding of cellular heterogeneity. Among these technologies, scRNA-Seq and scATAC-Seq stand out for their ability to profile gene expression and chromatin accessibility, respectively. But how can we integrate these two types of data to gain a more comprehensive view of cellular states? Let’s dive in!
Other tutorial: Seurat tutorial
Understanding the Data
-
scRNA-Seq: Provides gene expression levels in individual cells. The resulting matrix has genes as rows and cells as columns, with values representing gene expression levels.
-
scATAC-Seq: Profiles chromatin accessibility at specific genomic regions. The resulting matrix has genomic regions (peaks) as rows and cells as columns, with binary values indicating accessibility.
The Challenge
At first glance, these matrices seem incompatible. One provides gene-centric information, while the other is focused on genomic regions. So, how can we integrate them?
From Peaks to Genes
A common approach is to associate scATAC-Seq peaks with nearby genes. This can transform the scATAC-Seq matrix into a gene-by-cell matrix, similar to scRNA-Seq. Strategies include:
- Assigning each peak to the nearest gene’s transcription start site (TSS).
- Using tools that provide more sophisticated peak-to-gene assignment methods.
Integration Using Latent Spaces
Tools like Seurat don’t directly merge the matrices. Instead, they:
- Identify shared “latent spaces” or underlying patterns in the data.
- Find features (genes) that are highly variable in both datasets to serve as “anchors.”
- Use these anchors to align the datasets in a shared latent space.
Once integrated, joint analyses, such as clustering, can identify cell types present in both datasets.
Example Integration Workflow
From Peak to Seurat Object
A Seurat Object for ATAC data need more things than RNA matrix.
- Except peak matrix as the
ChromatinAssay
object, - we still need to ready the Chromosome annatation file for gene activity estimation.
- We also need the
Fragment
Object.
|
Integration
|
Anchors identifycation
This step would take lots of time.
|
Label transfer
After identifying anchors, we can transfer annotations from the scRNA-seq dataset onto the scATAC-seq cells. The annotations are stored in the seurat_annotations
field, and are provided as input to the refdata
parameter. The output will contain a matrix with predictions and confidence scores for each ATAC-seq cell.
|
Co-embedding scRNA-seq and scATAC-seq datasets
|
© Seurat |
Conclusion
Integrating scRNA-Seq and scATAC-Seq data provides a holistic view of cellular states, combining gene expression and chromatin accessibility information. While the integration process might seem daunting, understanding the underlying principles and using the right tools can make it achievable and insightful.
Integrating scRNA-Seq and scATAC-Seq Data: A Primer