Search for a command to run...
Context Accurately predicting soil organic carbon (SOC) and quantifying its influencing factors are crucial for global ecological sustainable development. Aims This study developed a SOC prediction model using machine learning methods – random forest, support vector machine (SVM), and deep neural network – based on 191 soil samples (100 from karst regions and 91 from non-karst regions). Variable selection and hyperparameter optimization were applied to improve model performance and clarify the influence mechanisms of different environmental factors on SOC. Methods The particle swarm optimization (PSO) algorithm was employed for hyperparameter tuning to select optimal model parameters, and different variable combinations were tested to enhance model performance. Model accuracy was evaluated using mean absolute error (MAE), root mean square error (RMSE), and the coefficient of determination (R2). The SHAP method was further applied to analyze the differential contributions of environmental variables to SOC in karst and non-karst regions. Key results Hyperparameter tuning and variable selection significantly improved prediction accuracy. The hyperparameter-optimized SVM model achieved optimal performance (R2 = 0.7533, MAE = 5.8018, RMSE = 12.3503) when using the variable combination including altitude, slope, cosine of slope aspect, sine of slope aspect, mean curvature, total nitrogen, soil depth, and pH. However, the unified modeling framework failed to adequately capture the differential impacts of environmental factors on SOC between karst and non-karst regions. SHAP analysis revealed that the dominant factors in non-karst regions are altitude, soil depth, and slope, whereas in karst regions, the dominant factors shift to total nitrogen, cosine of slope aspect, and pH. Conclusions This study established an optimized machine learning model capable of accurately predicting SOC under small-sample conditions and revealed the differential roles of environmental driving factors in karst and non-karst regions. Implications The research outcomes provide a viable solution for SOC prediction under small-sample conditions, emphasize the critical role of environmental variables in SOC modeling, and offer guidance for regionally differentiated soil carbon management.