Search for a command to run...
Background/Objectives: Vitamin B12 deficiency is a common but often underdiagnosed complication in patients with type 2 diabetes (T2D) undergoing long-term metformin therapy. Accurate early prediction could enable targeted screening and timely intervention. This study aimed to develop and interpret a machine learning model for predicting vitamin B12 deficiency in metformin-treated patients with T2D, using eXtreme Gradient Boosting (XGBoost). Methods: A retrospective cross-sectional study was conducted at a single endocrinology centre (La Rabta University Hospital, Tunis, Tunisia). Patients with T2D treated with metformin for at least three years were included (n = 257); those with conditions independently affecting vitamin B12 metabolism were excluded. Vitamin B12 deficiency was defined as a serum B12 level below 150 pmol/L or a borderline level (150–221 pmol/L) with concurrent hyperhomocysteinemia (>15 μmol/L). XGBoost was selected after comparison with Logistic Regression (L2), Random Forest, and Support Vector Machine on the same 5-fold stratified cross-validated pipeline. Hyperparameters were optimized via Bayesian search (100 iterations × 5-fold stratified cross-validation), with the Matthews correlation coefficient (MCC) as the primary optimization metric to account for class imbalance. Model interpretability was achieved using SHapley Additive exPlanations (SHAP). Discrimination and calibration were assessed on an independent test set using bootstrap 95% confidence intervals (2000 resamples). Results: Of 257 patients, 95 (37.0%) presented with vitamin B12 deficiency. On the independent test set (n = 52), the optimized XGBoost model achieved an ROC-AUC of 0.671 [95% CI: 0.514–0.818], sensitivity of 0.737 [95% CI: 0.533–0.938], specificity of 0.545 [95% CI: 0.375–0.710], MCC of 0.273 [95% CI: 0.018–0.517], and a Brier Score of 0.259. SHAP analysis identified HbA1c, microalbuminuria, autonomic neuropathy, BMI, DN4 score, and fasting glucose as the most influential predictors. Nonlinear SHAP interaction plots revealed an increased predicted risk in patients with low HbA1c combined with a high cumulative metformin dose. Conclusions: The XGBoost–SHAP framework provided interpretable predictions of vitamin B12 deficiency in patients with T2D on metformin, identifying key clinical profiles for targeted screening. External multi-centre validation is required before clinical deployment.