Single-cell RNA-seq analysis workflow for T cell populations from GEO dataset GSE179994.
This repository reproduces preprocessing and clustering of T cell scRNA-seq data and creates post-treatment CD4 and CD8 subsets for downstream comparative analysis between responders and non-responders to immunotherapy.
- README.md: project overview and run instructions.
- Scripts/SingleCellFunctions.R: reusable helper functions.
- Scripts/scRNAseq.R: main analysis pipeline.
- Scripts/run_analysis.R: one-command entrypoint.
- Scripts/proportion_analysis.ipynb: downstream proportion analysis notebook.
During execution, outputs are written to Figures/ and Results/.
- R 4.3.0 or newer.
- Seurat, dplyr, scuttle, ggplot2.
- Optional for AnnData export: SeuratDisk.
Install packages if needed:
install.packages(c("Seurat", "dplyr", "scuttle", "ggplot2"))Install SeuratDisk if you want .h5ad outputs:
install.packages("remotes")
remotes::install_github("mojaveazure/seurat-disk")Expected input files in repository root or GSE179994_RAW/:
- GSE179994_all.Tcell.rawCounts.rds
- GSE179994_Tcell.metadata.tsv
Required metadata column:
- cellid
From repository root:
Rscript Scripts/run_analysis.RPipeline steps:
- Load counts and metadata.
- Build and preprocess Seurat object.
- Add treatment labels (Pre/Post from sample name).
- Run clustering and UMAP.
- Subset post-treatment CD4 and CD8 populations.
- Flag low-quality cells using 2 MAD outlier thresholds.
- Harmonize metadata fields for AnnData compatibility (
cluster,Responsewhen present). - Save figures, RDS outputs, and export CD4/CD8 post-treatment subsets to h5ad (if SeuratDisk is installed).
- Figures/UMAP_Tcell.pdf
- Figures/UMAP_post_CD4_Tcell.pdf
- Figures/UMAP_post_CD8_Tcell.pdf
- Results/Seurat_Reanalyzed.rds
- Results/GSE179994_post_CD4_reanalyzed.rds
- Results/GSE179994_post_CD8_reanalyzed.rds
- Results/CD4_filtered_post.h5ad (if SeuratDisk installed)
- Results/CD8_filtered_post.h5ad (if SeuratDisk installed)
- CreateSeuratObject filters: min.cells = 3, min.features = 200.
- Cell-level QC: nFeature_RNA > 600, nFeature_RNA < 25000, nCount_RNA > 600.
- Seurat workflow: NormalizeData, FindVariableFeatures, ScaleData, PCA, neighbors, clustering, UMAP.
- CD4/CD8 post-treatment QC with scuttle::isOutlier using nmads = 2.
- Seurat v4 tutorial: https://satijalab.org/seurat/archive/v4.3/pbmc3k_tutorial
- Original publication: https://www.nature.com/articles/s43018-021-00292-8#Sec9
- Outlier-based QC rationale: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-02136-7
- Differential abundance context: https://www.nature.com/articles/s41598-024-66381-7
- scanpro: https://github.com/loosolab/scanpro