Search for a command to run...
Background Thyroid cancer presents a significant clinical challenge due to asymptomatic onset and poor post-metastasis prognosis. Current imaging methods lack specificity, and single biomarkers show limited diagnostic accuracy. This study aimed to develop and validate a diagnostic model integrating serum proteomics and machine learning for early detection. Methods Serum samples from 414 thyroid cancer patients and 430 healthy controls were analyzed using MALDI-TOF MS. Multiple machine learning algorithms were applied to construct diagnostic models and evaluated in an independent test set. Model interpretability was assessed using SHAP and LIME, and key peptide were identified through feature importance analysis. A simplified diagnostic model was subsequently reconstructed using the selected features. Discriminative performance was evaluated using ROC-AUC and DCA. GO and KEGG enrichment analyses were performed to elucidate the biological functions of differentially expressed proteins. Results The integrated machine learning model demonstrated excellent discriminative performance. Interpretability analyses indicated that the high performance of the model was driven by the robust and coordinated contributions of multiple features. 12 peptide peaks significantly associated with thyroid cancer were identified, and the simplified model based on these features maintained high diagnostic accuracy and provided greater net clinical benefit than single-protein biomarkers. Enrichment analysis revealed that those proteins were involved in immune regulation, lipid metabolism, and other cancer-related biological processes. Conclusions This study established and validated a serum peptide-based diagnostic model integrating machine learning for thyroid cancer, exhibiting promising diagnostic performance in the single-center cohort, providing a non-invasive strategy for early detection and a basis for further research.