Search for a command to run...
Single-cell RNA sequencing (scRNA-seq) provides an important means to reveal the heterogeneity and dynamic processes of tissues, organisms, and complex diseases, but technical capture loss (dropout) often obscures true biological expression, and existing imputation methods have difficulty distinguishing biological zeros (silent expression) from technical noise. To address this, we propose the imputation framework scZN. scZN assumes that the observed scRNA-seq data arise from a combination of RNA's two-state transcription process and dropout, and formulates imputation as nonnegative factorization: decomposing the raw count matrix into two interpretable nonnegative factors, performing learning and optimization under constraints from prior knowledge and multiple regularizations, thereby reconstructing the cellular expression landscape. Experiments show that scZN can capture the true distributional characteristics at both the gene and cell levels and significantly suppress spurious activation of genes that should not be expressed. Across multiple real datasets, it outperforms dozens of state-of-the-art methods. Especially in complex experimental design scenarios, scZN markedly improves trajectory inference for embryonic stem cells and mouse dentate gyrus data. In Alzheimer's disease data, scZN can also effectively recover pathways related to neuroinflammation, improving downstream scRNA-seq analysis. Overall, scZN provides a unified framework for missing-value imputation and expression reconstruction that combines accuracy and interpretability.
Published in: PLoS Computational Biology
Volume 22, Issue 3, pp. e1014051-e1014051