Search for a command to run...
Approximately 85% of the population in sub-Saharan Africa equating to around 894 million people depend on traditional biomass fuels such as firewood, charcoal, and agricultural waste for their cooking needs. Understanding what shapes cooking fuel choice in sub-Saharan Africa is essential for supporting clean energy transitions and advancing national policies and global goals, including Sustainable Development Goals (SDGs) 3, 5, 7, and 13. Therefore, this study aimed to predict the key drivers of household cooking fuel choice in sub-Saharan Africa using supervised machine learning techniques. This study analyzed the most recent Demographic and Health Survey (DHS) data collected between 2015 and 2024 from 28 sub-Saharan African countries (N = 430,811 households) to predict cooking fuel choice using supervised machine learning. The DHS employs a multi-stage, stratified cluster sampling design, and household sampling weights were applied throughout the analysis to account for unequal probabilities of selection and non-response. Seven supervised learning models -Random Forest (RF), Decision Tree (DT), Extreme Gradient Boosting (XGB), Logistic Regression (LR), AdaBoost, Naive Bayes, and Artificial Neural Networks (ANN) were trained on 80% of the data, with 20% reserved for testing. The dataset was highly imbalanced, with a 5.4:1 ratio of unclean to clean fuels, so we applied the Synthetic Minority Oversampling Technique (SMOTE) during model training to address this imbalance and implemented 10-fold cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, and Area Under the Receiver Operating Characteristic Curve (AUC-ROC) metrics. SHAP (SHapley Additive exPlanations) values were used to identify key factors influencing the predictions. This study revealed that 84.40% of households in sub-Saharan Africa relied on unclean fuels for cooking, with significant disparities across household characteristics. The XGBoost model demonstrated superior predictive performance, achieving a mean accuracy of 80.43% (95% CI: 80.17-80.67%) and a mean AUC of 0.8987 (95% CI: 0.8962- 0.9012), outperforming other machine learning algorithms. SHAP analysis identified electricity as the highest impact variable, followed by residence, TV ownership, highest education status, and wealth index. The analysis demonstrated that unclean fuels use for cooking remains highly prevalent in sub-Saharan Africa. XGBoost outperformed other models in predicting cooking fuel choice. Governments in sub-Saharan Africa should prioritize improving electricity access, reducing rural-urban disparities, expanding education, and strengthening household economic conditions to promote cleaner cooking fuels. The predictive model can help policymakers and development organizations identify populations most at risk of relying on polluting fuels, enabling targeted and cost-effective interventions such as electrification programs, clean fuel subsidies, and awareness campaigns promoting clean cooking technologies.