Search for a command to run...
Transparent and evidence-based representations of global crude oil refining systems remain limited in the public literature, constraining robust energy systems modeling and policy analysis. This study develops a comprehensive, configuration-based modeling framework for all operating crude oil refineries worldwide using plant-level process unit data. Forty unique refinery configurations are identified through an unsupervised decision tree-based clustering approach that accounts for process unit presence and relative conversion intensity. An extremely randomized trees (ETR) machine learning model is trained on approximately 11,000 refinery-year observations to predict refined product yields as a function of refinery configuration, capacity, and crude oil diet. The model achieves out-of-sample coefficients of determination exceeding 0.90 for all major products and outperforms multiple linear regression and other ensemble methods. The predictive model is integrated with a differential evolution optimization algorithm to enable refinery programming under operational and feedstock constraints. The application of this model to Gulf Cooperation Council (GCC) refineries shows that, under existing technologies, petrochemical feedstock yields are bounded at approximately 37%, significantly below announced long-term diversification targets of 70–85%. Yield improvements of up to 6 percentage points are feasible through operational optimization but are associated with capacity utilization adjustments and product trade-offs. The framework provides a scalable tool for refinery benchmarking, energy transition analysis, and strategic planning across facility, national, and global levels.