Biologically informed genetic data transformations improve multi-omic comorbidity prediction in people with HIV

20260 citationsJournal Articlegreen Open Access

Authors

Barry Ryan · École Polytechnique Fédérale de Lausanne

Christian W. Thorball · University of Lausanne

Mariam Ait Oumelloul · École Polytechnique Fédérale de Lausanne

Roger Kouyos · University of Zurich

PE Tarr · University Hospital of Basel

Jacques Fellay · École Polytechnique Fédérale de Lausanne

Abstract

Abstract Coronary artery disease (CAD) and chronic kidney disease (CKD) are in part genetically determined and are associated with various omics layers. Methods for integrating genomics data with omics profiles remain to be standardised. This study evaluates biological data transformations to optimise the integration of genomics with other omics for comorbidity prediction in people with HIV (PWH). We trained linear and deep-learning single-omic and multi-omic models on two cohorts of PWH with genotype and one other omics data available. 436 CAD cases and 166 CKD were evenly split across train/validation/test cohorts. Multi-omic integration evaluated feature concatenation against encoder-based architectures and performance was estimated via five-fold cross-validation on fixed patient splits, reporting mean accuracy and standard errors. Genotype data was represented in four ways: (i) raw SNP genotype matrices; (ii) principal component (PCA) embeddings; (iii) polygenic risk scores (PRS); and (iv) AlphaGenome-derived gene-level impact scores. Each genotype representation was compared individually and when integrated in a multi-omics model. The results demonstrate that biologically informed genomic transformations improve prediction in multi-omics models. In both classification tasks, integrating raw SNPs (CAD accuracy = 0.55 ± 0.03; CKD accuracy = 0.63 ± 0.01) or genotype PCs (CAD accuracy = 0.54 ± 0.03; CKD accuracy = 0.62 ± 0.03) with other omics reduced performance relative to the best corresponding single-omics models. By contrast, PRS (CAD accuracy = 0.61 ± 0.03; CKD accuracy = 0.65 ± 0.02) and AlphaGenome (CAD accuracy = 0.57 ± 0.03; CKD accuracy = 0.67 ± 0.02) improved accuracy. As multi-omics analyses become more prominent, methods that integrate genomics effectively without requiring large cohorts will become increasingly valuable; here, we highlight two such approaches.

Topics & Keywords

Genetic Associations and Epidemiology HIV-related health complications and treatments Genetic Mapping and Diversity in Plants and Animals

UN Sustainable Development Goals

Good health and well-being

Publication Details

Published in: medRxiv

DOI: 10.64898/2026.03.09.26347570

Field-Weighted Citation Impact: 0.00