Search for a command to run...
HSC-iM Manuscript Code Archive This repository contains code used for the HSC-iM manuscript. It contains the numbered notebooks/scripts and associated dataset folders used for the key analyses underlying the figures. Due to space limitations with the data upload, not all datasets will be present within this repository. Please also refer to the GEO repositories referenced within the Data Availability section of the manuscript. Repository Layout - Numbered notebooks/scripts are organized by analysis block: - ANALYSIS CODE - NOTEBOOKS 0-5 - ANALYSIS CODE - NOTEBOOKS 6-10 - ANALYSIS CODE - NOTEBOOKS 11-15 - ANALYSIS CODE - NOTEBOOKS 16-20- Data are organized by analysis block: - DATASETS - NOTEBOOKS 0-5 - DATASETS - NOTEBOOKS 6-10 - DATASETS - NOTEBOOKS 11-15 - DATASETS - NOTEBOOKS 16-20 Analysis Notebooks Supporting LT-HSC scRNA-seq Analysis- 0a_LTHSC_scRNAseq_Clustering.ipynb: Processes LT-HSC scRNA-seq data for dimensionality reduction and clustering.- 0b_HSPC_LTHSC_scRNAseq_Clustering.Rmd: Analyzes cord blood HSPC and LT-HSC multiome profiles with cNMF analysis. Xenograft HSPC Multiome Processing- 1_FinalRun_Preprocessing.Rmd: Xenograft multiome preprocessing (QC, filtering, and demultiplexing).- 2_Projection_Scoring.Rmd: Projects cells onto BoneMarrowMap and computes signature scores for various genesets.- 3_RNAClustering_Annotation.Rmd: RNA-based clustering with marker-guided annotation of cell states.- 4_RNA_ATAC_JointClustering.Rmd: Joint RNA/ATAC clustering and saves annotated object for downstream biological comparisons.- 4b_JointEmbedding_by_Condition.Rmd: Generates condition-stratified joint embeddings (e.g., PBS/TNF/LPS).- 5_CellTypeMarkers_DE_DA.Rmd: Calculates marker genes and differential expression/accessibility profiles for each cell state. Xenograft HSPC Regulon and Memory Analysis- 6_Differential_TFregulon_Analysis.Rmd: Tests differential TF e-Regulon activity between HSC-iM and HSC-I, and between treatment conditions (PBS/TNF/LPS).- 7_Augur.Rmd: Applies Augur to quantify condition HSC-iM vs HSC-I separability within each treatment condition (PBS/TNF/LPS).- 8_HSCiM_vs_BCGprogram.Rmd: Evaluates HSC-iM program enrichment within BCG-trained immunity datasets to assess for shared biology.- 9_HSCiM_vs_MemoryT_Akondy.Rmd: Evaluates HSC-iM program enrichment within functionally defined memory T-cell subsets profiled by bulk RNA-seq and ATAC-seq.- 10_HSCiM_vs_MemoryT_Hao.Rmd: Evaluates HSC-iM program enrichment within immunophenotypically defined T-cell subsets by PBMC CITE-seq. Aging, post-COVID, and Clonal Hematopoiesis (CH) benchmarking- 11_postCOVID_analysis.Rmd: Evaluates HSC-iM program enrichment within HSC/MPP from post-COVID patient samples, establishing the relevance of HSC-iM to COVID-19 recovery.- 12_AgingHSC_MetaAnalysis.Rmd: Evaluates HSC-iM program enrichment within HSC/MPP from aged vs young human bone marrow samples spanning multiple cohorts, establishing the relevance of HSC-iM to human HSC aging.- 13_JakobsenCH_HSC2_HSCiM_Comparison.Rmd: Analysis of CH Bone Marrow samples profiled by TARGET-seq, demonstrating that the CH Bone Marrow HSC2 state is equivalent to Xenograft HSC-iM state.- 14_JakobsenCH_HSCiM_Additional_Analyses.Rmd: Further CH TARGET-seq analyses, including regulon analysis and correlations of the HSC-iM program with age and clone size.- 15_ExternalValidation_GSEA_Benchmarking.Rmd: Benchmarking the significance of HSC-iM program enrichment by GSEA across aging, COVID, and CH datasets.- 15b_HSCiM_GSEA_SickleCell.Rmd: Evaluates HSC-iM program enrichment within HSC/MPP from patients with sickle-cell disease. Clonal dominance, progeny, and Intermountain Risk Score (IRS) workflows- 16_CH_HSC1_vs_HSC2_Dominance.Rmd: Identifies CH donors for which the HSC pool is dominated by either HSC-I or HSC-iM, and compares their differentiation patterns.- 17a_CH_BM_Xenograft_Analysis_pt1.ipynb: First-stage analysis of CH bone marrow and CH-derived xenograft samples that creates merged objects and intermediate embeddings required by notebook 17b.- 17b_CH_BM_Xenograft_Analysis_pt2.Rmd: Uses notebook 17a intermediates to quantify HSC origin/composition and compare cell state distributions between primary CH bone marrow and CH xenograft compartments.- 18_Xenograft_ProgenitorMyeloid_SoupOrCell_Code.Rmd: Performs Xenograft progenitor & myeloid cell analyses with assignment of HSCI vs HSC-iM origin, differentiation output, and gene expression comparisons.- 19_HSCiM_ProgenyAnalysis_byClone_byDonor.Rmd: Tests associations between HSC-iM program enrichment within the HSC pool and characteristics of downstream progeny using correlation analyses at the clone-level and donor-level.- 20a_HSCiM_IRS_forUpload_PART1.py: IRS pipeline part 1 for preprocessing and score derivation to prepare inputs for downstream statistical modeling. This is Ontario Health Study (OHS) data available by request.- 20b_HSCiM_IRS_forUpload_PART2.R: IRS pipeline part 2 for model fitting, interpretation, and generation of figure summaries. This is Ontario Health Study (OHS) data available by request. Dataset Folder Guide DATASETS - NOTEBOOKS 0-5Core multiome xenograft inputs and outputs related to notebooks 0-5, including processed objects and differential expression outputs. DATASETS - NOTEBOOKS 6-10External validation and regulon analysis inputs/outputs for notebooks 6-10 (BCG, memory T, peak/RNA matrices, Augur/SCENIC-related analysis). DATASETS - NOTEBOOKS 11-15Aging/post-COVID/CH analysis inputs and outputs related to notebooks 11-15, including reference objects, DEGs, and genesets. DATASETS - NOTEBOOKS 16-20Inputs and outputs for notebooks 16-20 related to CH ex vivo/xenograft integrations and clone/donor analyses across datasets.