0 Posted 2025-07-06Updated 2025-07-17Biology / Bioinformatics / Single Cell3 minutes read (About 414 words)

Reading Public scRNA-seq Data into Seurat

10x Genomics-style

Exp: GSE163558

GSE163558
├── GSM5004188_Li1_barcodes.tsv.gz
├── GSM5004188_Li1_features.tsv.gz
├── GSM5004188_Li1_matrix.mtx.gz
...
├── GSM5004189_Li2_barcodes.tsv.gz
├── GSM5004189_Li2_features.tsv.gz
└── GSM5004189_Li2_matrix.mtx.gz

Based on this patter, we need to separate the files by sample into separated directory first.

for i in $(ls | awk -F"_" '{print $2}'| uniq);do 
    mkdir $i;
done

for i in `ls *gz`;do
    NAME=$(echo $i| awk -F"_" '{print $2"/"$3}'); 
    mv $i $NAME;
done

After separating, the structure will look like this:

GSE163558
├── Li1
│   ├── barcodes.tsv.gz
│   ├── features.tsv.gz
│   └── matrix.mtx.gz
...
└── PT3
    ├── barcodes.tsv.gz
    ├── features.tsv.gz
    └── matrix.mtx.gz

Now, we can read and convert each sample into Seurat object, and merge them together.

library(Seurat)
library(stringr)

setwd("../GSE163558")
files <- list.files()

seurat_list <- list()
for (f in files) {
  # Read each file
  df_temp <- Read10X(data.dir = f)
  # Use GSM ID or tag from filename
  tag <- f
  # Create Seurat object and store
  seurat_list[[tag]] <- CreateSeuratObject(counts = df_temp, project = f)
}

# Merge all Seurat objects with cell prefix (to avoid name clashes)
seurat_obj <- merge(
  seurat_list[[1]],
  y = seurat_list[-1],
  add.cell.ids = names(seurat_list)
)

Separated DataFrame

Exp: GSE134520

In this dataset, we have separated expression matrix. What we need to do is to read each file, convert it into Seurat object, and merge.

GSE134520
├── GSM3954946_processed_NAG1.txt
├── GSM3954947_processed_NAG2.txt
├── GSM3954948_processed_NAG3.txt
...
├── GSM3954956_processed_IMS3.txt
├── GSM3954957_processed_IMS4.txt
└── GSM3954958_processed_EGC.txt

library(Seurat)
library(stringr)

setwd("GSE134520")
files <- list.files(pattern = "GSM.*\\.txt$")

seurat_list <- list()

for (f in files) {
  # Read each file
  mat <- read.table(f, header = TRUE, row.names = 1, sep = "\t", check.names = FALSE)
  # Use GSM ID or tag from filename
  tag <- gsub(".txt", '',  str_split(f, "_")[[1]][3]) # Or str_split(f, "_")[[1]][1]
  # Create Seurat object and store
  seurat_list[[tag]] <- CreateSeuratObject(counts = mat, project = tag)
}

# Merge all Seurat objects with cell prefix (to avoid name clashes)
seurat_obj <- merge(
  seurat_list[[1]],
  y = seurat_list[-1],
  add.cell.ids = names(seurat_list)
)