Search for a command to run...
The identification of bipolar disorder (BD), a severe psychiatric condition characterized by recurrent mood fluctuations, remains challenging due to substantial inter-individual variability, symptom overlap with other mental disorders, and imbalanced clinical data. Delayed or inaccurate diagnosis often leads to inappropriate treatment strategies and adverse clinical outcomes, highlighting the need for reliable, data-driven decision-support tools. In this study, we propose a robust hybrid machine learning framework that integrates class balancing, latent subgroup discovery, and ensemble learning to improve the accuracy and consistency of BD identification from tabular clinical data. The framework applies the Synthetic Minority Over-sampling Technique (SMOTE) exclusively to the training data to address class imbalance, followed by Gaussian Mixture Model (GMM) based clustering to uncover latent patient subgroups and generate informative probabilistic features. These enriched features are subsequently used to train an optimized Extreme Gradient Boosting (XGBoost) classifier. Experimental evaluation on an independent test set demonstrates that the proposed model achieves 93% accuracy, 97% sensitivity (recall), 93% precision, 95% F1-score, and 79% specificity. When evaluated under identical experimental conditions, the proposed framework consistently outperforms baseline classifiers, including Support Vector Machine, Decision Tree, Logistic Regression, and Random Forest, with performance improvements ranging from 6 to 12%, depending on the comparator. The results indicate that combining SMOTE-based data balancing, GMM-driven latent feature enrichment, and gradient-boosted decision trees yields a scalable, interpretable, and clinically relevant decision-support system. This study supports the adoption of hybrid, data-driven approaches for early BD screening and personalized treatment planning in psychiatric healthcare settings.