Reading Public scRNA-seq Data into Seurat

10x Genomics-style

Exp: GSE163558

GSE163558
├── GSM5004188_Li1_barcodes.tsv.gz
├── GSM5004188_Li1_features.tsv.gz
├── GSM5004188_Li1_matrix.mtx.gz
...
├── GSM5004189_Li2_barcodes.tsv.gz
├── GSM5004189_Li2_features.tsv.gz
└── GSM5004189_Li2_matrix.mtx.gz

Based on this patter, we need to separate the files by sample into separated directory first.

for i in $(ls | awk -F"_" '{print $2}'| uniq);do 
mkdir $i;
done

for i in `ls *gz`;do
NAME=$(echo $i| awk -F"_" '{print $2"/"$3}');
mv $i $NAME;
done

After separating, the structure will look like this:

GSE163558
├── Li1
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
...
└── PT3
    ├── barcodes.tsv.gz
    ├── features.tsv.gz
    └── matrix.mtx.gz

Now, we can read and convert each sample into Seurat object, and merge them together.

library(Seurat)
library(stringr)

setwd("../GSE163558")
files <- list.files()

seurat_list <- list()
for (f in files) {
# Read each file
df_temp <- Read10X(data.dir = f)
# Use GSM ID or tag from filename
tag <- f
# Create Seurat object and store
seurat_list[[tag]] <- CreateSeuratObject(counts = df_temp, project = f)
}

# Merge all Seurat objects with cell prefix (to avoid name clashes)
seurat_obj <- merge(
seurat_list[[1]],
y = seurat_list[-1],
add.cell.ids = names(seurat_list)
)

Separated DataFrame

Exp: GSE134520

In this dataset, we have separated expression matrix. What we need to do is to read each file, convert it into Seurat object, and merge.

GSE134520
├── GSM3954946_processed_NAG1.txt
├── GSM3954947_processed_NAG2.txt
├── GSM3954948_processed_NAG3.txt
...
├── GSM3954956_processed_IMS3.txt
├── GSM3954957_processed_IMS4.txt
└── GSM3954958_processed_EGC.txt
library(Seurat)
library(stringr)

setwd("GSE134520")
files <- list.files(pattern = "GSM.*\\.txt$")

seurat_list <- list()

for (f in files) {
# Read each file
mat <- read.table(f, header = TRUE, row.names = 1, sep = "\t", check.names = FALSE)
# Use GSM ID or tag from filename
tag <- gsub(".txt", '', str_split(f, "_")[[1]][3]) # Or str_split(f, "_")[[1]][1]
# Create Seurat object and store
seurat_list[[tag]] <- CreateSeuratObject(counts = mat, project = tag)
}

# Merge all Seurat objects with cell prefix (to avoid name clashes)
seurat_obj <- merge(
seurat_list[[1]],
y = seurat_list[-1],
add.cell.ids = names(seurat_list)
)

Reading Public scRNA-seq Data into Seurat

https://karobben.github.io/2025/07/06/Bioinfor/scRNA-read/

Author

Karobben

Posted on

2025-07-06

Updated on

2025-07-17

Licensed under

Comments