Enhanced early detection of thyroid cancer using ensemble machine learning and serum proteomics

20260 citationsJournal Articlegold Open Access

Authors

Jiangbo Ding · Xian Yang Central Hospital

Zhangjian Zhou · Second Affiliated Hospital of Xi'an Jiaotong University

Abstract

Background Thyroid cancer presents a significant clinical challenge due to asymptomatic onset and poor post-metastasis prognosis. Current imaging methods lack specificity, and single biomarkers show limited diagnostic accuracy. This study aimed to develop and validate a diagnostic model integrating serum proteomics and machine learning for early detection. Methods Serum samples from 414 thyroid cancer patients and 430 healthy controls were analyzed using MALDI-TOF MS. Multiple machine learning algorithms were applied to construct diagnostic models and evaluated in an independent test set. Model interpretability was assessed using SHAP and LIME, and key peptide were identified through feature importance analysis. A simplified diagnostic model was subsequently reconstructed using the selected features. Discriminative performance was evaluated using ROC-AUC and DCA. GO and KEGG enrichment analyses were performed to elucidate the biological functions of differentially expressed proteins. Results The integrated machine learning model demonstrated excellent discriminative performance. Interpretability analyses indicated that the high performance of the model was driven by the robust and coordinated contributions of multiple features. 12 peptide peaks significantly associated with thyroid cancer were identified, and the simplified model based on these features maintained high diagnostic accuracy and provided greater net clinical benefit than single-protein biomarkers. Enrichment analysis revealed that those proteins were involved in immune regulation, lipid metabolism, and other cancer-related biological processes. Conclusions This study established and validated a serum peptide-based diagnostic model integrating machine learning for thyroid cancer, exhibiting promising diagnostic performance in the single-center cohort, providing a non-invasive strategy for early detection and a basis for further research.

Topics & Keywords

Advanced Proteomics Techniques and Applications Thyroid Cancer Diagnosis and Treatment Machine Learning in Bioinformatics

UN Sustainable Development Goals

Reduced inequalities

Publication Details

Published in: Frontiers in Oncology

Volume 16

DOI: 10.3389/fonc.2026.1807894

Field-Weighted Citation Impact: 0.00