4CPS-305 Classification of patients with suspected hepatitis C with a machine learning model

20260 citationsJournal Article

Authors

DJ Boardman González · Hospital Virgen de la Luz

MI Martin Niño · Hospital Virgen de la Luz

E García López · Hospital Virgen de la Luz

B Martínez Ruiz · Hospital Virgen de la Luz

G Picazo Sanchiz · Hospital Virgen de la Luz

L Martínez Valdivieso · Hospital Virgen de la Luz

Abstract

<h3>Background and Importance</h3> Early diagnosis of hepatitis C virus (HCV) infection is essential to prevent progression toward advanced liver disease. Machine learning (ML) offers new possibilities for clinical prediction based on biomarkers. <h3>Aim and Objectives</h3> This study presents the performance of the Random Forest model applied to a clinical dataset, aiming to evaluate its classification ability for patients with different stages of liver involvement. <h3>Material and Methods</h3> The dataset used was obtained from the University of California-Irvine Machine Learning Repository, comprising 589 individuals classified into five groups: blood donors, suspected cases, hepatitis, fibrosis, and cirrhosis. Variables included age, sex, and liver biochemical parameters (albumin, alkaline phosphatase, aminotransferases, bilirubin, cholinesterase, cholesterol, creatinine, GGT, and total proteins). The model was trained using k-fold cross-validation (k=10). Area under the curve (AUC), sensitivity, and specificity were recorded in each iteration. Variable importance was analysed through model performance and the Kruskal-Wallis test. <h3>Results</h3> Of the 589 patients, 533 were healthy blood donors (90.5%), 24 (4.1%) had cirrhosis, 12 (2%) fibrosis, and 20 (3.4%) hepatitis. The median age was 47 years. 363 patients (59%) were male. Inferential statistical analysis revealed significant differences (p<0.05) in the following variables: Albumin, median: 42 (g/L); p=0.01 GGT, median: 21.3 (U/L); p=0.020 Cholinesterase, median: 5.45 (U/mL); p=0.0004 Cholesterol, median: 5.4 (mmol/L); p=0.0081 Creatinine, median: 78 (µmol/L); p=0.011 Total proteins, median: 71.2 (g/L); p=0.016 For the remaining variables, no statistically significant differences were observed among patients. The Random Forest model showed a median AUC of 0.9435, sensitivity of 100%, and specificity of 90.95%, with no false negatives in cross-validation. <h3>Conclusion and Relevance</h3> Random Forest demonstrated high classification accuracy in patients with suspected HCV infection, confirming the value of combining liver biomarkers with ML to improve diagnosis. This approach could optimise screening tools in hospital settings. However, prospective studies incorporating additional clinical variables are needed, as the model was based on historical data without real-world validation. <h3>References and/or Acknowledgements</h3> 1. Lichtinghagen R, Klawonn F, Hoffmann G. HCV data [Dataset]. UCI Machine Learning Repository. (2020). https://doi.org/10.24432/C5D612 <h3>Conflict of Interest</h3> No conflict of interest

Topics & Keywords

Artificial Intelligence in Healthcare Hepatitis C virus research Liver Disease Diagnosis and Treatment

Publication Details

Published in: Section 4: Clinical pharmacy services

DOI: 10.1136/ejhpharm-2026-eahp.402

Command Palette

4CPS-305 Classification of patients with suspected hepatitis C with a machine learning model

Authors

Abstract

Topics & Keywords

Publication Details