Search for a command to run...
Abstract Coronary artery disease (CAD) and chronic kidney disease (CKD) are in part genetically determined and are associated with various omics layers. Methods for integrating genomics data with omics profiles remain to be standardised. This study evaluates biological data transformations to optimise the integration of genomics with other omics for comorbidity prediction in people with HIV (PWH). We trained linear and deep-learning single-omic and multi-omic models on two cohorts of PWH with genotype and one other omics data available. 436 CAD cases and 166 CKD were evenly split across train/validation/test cohorts. Multi-omic integration evaluated feature concatenation against encoder-based architectures and performance was estimated via five-fold cross-validation on fixed patient splits, reporting mean accuracy and standard errors. Genotype data was represented in four ways: (i) raw SNP genotype matrices; (ii) principal component (PCA) embeddings; (iii) polygenic risk scores (PRS); and (iv) AlphaGenome-derived gene-level impact scores. Each genotype representation was compared individually and when integrated in a multi-omics model. The results demonstrate that biologically informed genomic transformations improve prediction in multi-omics models. In both classification tasks, integrating raw SNPs (CAD accuracy = 0.55 ± 0.03; CKD accuracy = 0.63 ± 0.01) or genotype PCs (CAD accuracy = 0.54 ± 0.03; CKD accuracy = 0.62 ± 0.03) with other omics reduced performance relative to the best corresponding single-omics models. By contrast, PRS (CAD accuracy = 0.61 ± 0.03; CKD accuracy = 0.65 ± 0.02) and AlphaGenome (CAD accuracy = 0.57 ± 0.03; CKD accuracy = 0.67 ± 0.02) improved accuracy. As multi-omics analyses become more prominent, methods that integrate genomics effectively without requiring large cohorts will become increasingly valuable; here, we highlight two such approaches.