Search for a command to run...
<b>Background/Objectives</b>: Cytochrome P450 3A4 (CYP3A4) metabolizes approximately 30-50% of clinically used drugs; thus, accurate prediction of CYP3A4 inhibition is essential for early assessment of drug-drug interaction (DDI) risk and toxicity. This study evaluated an integrated artificial intelligence framework for predicting CYP3A4 inhibition (%) using a large, curated chemical dataset. <b>Methods</b>: A dataset of 23,713 compounds was compiled from the Korea Chemical Bank and multiple commercial and public databases. Vector-based machine learning (ML) models (LightGBM, XGBoost, CatBoost, and a weighted ML ensemble) and graph neural network (GNN) models (O-GNN with contrastive learning and manifold mixup (O-GNN + CL + Mixup), D-MPNN, GINE, and GATv2) were evaluated. Manifold mixup was applied during GNN training, and SMILES enumeration-based test-time augmentation was used at inference. The best-performing ML and GNN models were integrated using a weighted ensemble strategy. Model interpretability was examined using SHAP analysis for ML models and occlusion sensitivity analysis for O-GNN + CL + Mixup. <b>Results</b>: The weighted ML ensemble achieved the best performance among ML models (RMSE = 19.1031, Pearson correlation coefficient (PCC) = 0.7566); the O-GNN + CL + Mixup model performed the best among GNN models (RMSE = 20.1002, PCC = 0.7265). The hybrid model achieved improved predictive accuracy (RMSE = 19.0784, PCC = 0.7570). External validation on 100 newly generated experimental data points confirmed generalizability (Custom Metric = 0.8035). <b>Conclusions</b>: This study demonstrated that integrating ML and GNN models with data augmentation strategies improves the robustness and interpretability of CYP3A4 inhibition prediction and established a practical framework for metabolic screening and DDI risk assessment.