Search for a command to run...
Missense single nucleotide variants (SNVs) represent one of the most common forms of genetic variation and account for a substantial proportion of variants of uncertain significance in clinical databases. Accurate computational classification of these variants remains an important challenge in precision medicine and genomic research. In this study, we present PathoPredictor, an interpretable machine-learning framework designed to distinguish pathogenic from benign missense variants using curated clinical variant data and functional annotations. High-confidence variants were obtained from the November 2023 ClinVar release and annotated using dbNSFP v5.1 (GRCh37). After data filtering, imputation, and normalization, 59,302 expert-reviewed missense variants were retained for model development. Six machine-learning algorithms were evaluated under identical cross-validation conditions applied to the training set. Among the evaluated models, LightGBM demonstrated the strongest overall performance and was selected as the final PathoPredictor classifier, achieving a mean ROC–AUC of 0.93 ± 0.004, accuracy of 0.90 ± 0.006, and Matthew’s correlation coefficient of 0.80 ± 0.008 across five cross-validation folds. Model interpretability was examined using SHAP (SHapley Additive exPlanations), enabling both global feature ranking and variant-level explanation of predictions. Temporal validation using ClinVar variants submitted after November 2023 showed consistent predictive performance on previously unseen submissions within the same database ecosystem (ROC–AUC = 0.91). While the framework demonstrates strong discrimination and structured interpretability, potential limitations include training data bias and partial circularity associated with the inclusion of existing meta-predictors. Overall, PathoPredictor provides a reproducible and interpretable computational framework for integrating functional annotations in missense variant prioritization, supporting research and genomic analysis workflows.
Published in: Journal of Genome Biotechnology and Genetics
Volume 1, Issue 1, pp. 3-3
DOI: 10.3390/jgbg1010003