Enhancing CYP3A4 Inhibition Prediction Using a Hybrid GNN–ML Model with Data Augmentation

20260 citationsJournal Articlegold Open Access

Authors

Somin Woo · Daegu National University of Education

Ju-Hyeok Jeon · GeneMatrix (South Korea)

Sangil Han · Kyungpook National University

C. Justin Lee · Kyungpook National University

Sang Hyun Min · Daegu National University of Education

Abstract

Background/Objectives: Cytochrome P450 3A4 (CYP3A4) metabolizes approximately 30-50% of clinically used drugs; thus, accurate prediction of CYP3A4 inhibition is essential for early assessment of drug-drug interaction (DDI) risk and toxicity. This study evaluated an integrated artificial intelligence framework for predicting CYP3A4 inhibition (%) using a large, curated chemical dataset. Methods: A dataset of 23,713 compounds was compiled from the Korea Chemical Bank and multiple commercial and public databases. Vector-based machine learning (ML) models (LightGBM, XGBoost, CatBoost, and a weighted ML ensemble) and graph neural network (GNN) models (O-GNN with contrastive learning and manifold mixup (O-GNN + CL + Mixup), D-MPNN, GINE, and GATv2) were evaluated. Manifold mixup was applied during GNN training, and SMILES enumeration-based test-time augmentation was used at inference. The best-performing ML and GNN models were integrated using a weighted ensemble strategy. Model interpretability was examined using SHAP analysis for ML models and occlusion sensitivity analysis for O-GNN + CL + Mixup. Results: The weighted ML ensemble achieved the best performance among ML models (RMSE = 19.1031, Pearson correlation coefficient (PCC) = 0.7566); the O-GNN + CL + Mixup model performed the best among GNN models (RMSE = 20.1002, PCC = 0.7265). The hybrid model achieved improved predictive accuracy (RMSE = 19.0784, PCC = 0.7570). External validation on 100 newly generated experimental data points confirmed generalizability (Custom Metric = 0.8035). Conclusions: This study demonstrated that integrating ML and GNN models with data augmentation strategies improves the robustness and interpretability of CYP3A4 inhibition prediction and established a practical framework for metabolic screening and DDI risk assessment.

Topics & Keywords

Pharmacogenetics and Drug Metabolism Computational Drug Discovery Methods Machine Learning in Bioinformatics

Publication Details

Published in: Pharmaceuticals

Volume 19, Issue 2, pp. 258-258

DOI: 10.3390/ph19020258

Field-Weighted Citation Impact: 0.00

Command Palette

Enhancing CYP3A4 Inhibition Prediction Using a Hybrid GNN–ML Model with Data Augmentation

Authors

Abstract

Topics & Keywords

Publication Details