Overcoming the Curse of Dimensionality with Synolitic AI

20260 citationsJournal Articlegold Open Access

Authors

Alexey V. Zaikin · University College London

Ivan Sviridov · Timber Institute

Artem Sosedka · National University of Science and Technology

Anastasia Linich · Yandex School of Data Analysis

Ruslan Nasyrov · Moscow Banking Institute

Evgeny M. Mirkes · Research Centre for Medical Genetics

Tatiana Tyukina

Abstract

High-dimensional tabular data are common in biomedical and clinical research, yet conventional machine learning methods often struggle in such settings due to data scarcity, feature redundancy, and limited generalization. In this study, we systematically evaluate Synolitic Graph Neural Networks (SGNNs), a framework that transforms high-dimensional samples into sample-specific graphs by training ensembles of low-dimensional pairwise classifiers and analyzing the resulting graph structure with Graph Neural Networks. We benchmark convolution-based (GCN) and attention-based (GATv2) models across 15 UCI datasets under two training regimes: a foundation setting that concatenates all datasets and a dataset-specific setting with macro-averaged evaluation. We further assess cross-dataset transfer, robustness to limited training data, feature redundancy, and computational efficiency, and extend the analysis to a real-world ovarian cancer proteomics dataset. The results show that topology-aware node feature augmentation provides the dominant performance gains across all regimes. In the foundation setting, GATv2 achieves an ROC-AUC of up to 92.22 (GCN: 91.22), substantially outperforming XGBoost (86.05), α=0.001. In the dataset-specific regime, GATv2, combined with minimum-connectivity filtering, achieves a macro ROC-AUC of 83.12, compared to 80.28 for XGBoost. Leave-one-dataset-out evaluation confirms cross-domain transfer, with an ROC-AUC of up to 81.99. SGNNs maintain ROC-AUC around 85% with as little as 10% of the training data and consistently outperform XGBoost in more extreme low-data regimes, α=0.001. On ovarian cancer proteomics data, foundation training improves both predictive performance and stability. Efficiency analysis shows that graph filtering substantially reduces training time, inference latency, and memory usage without compromising accuracy. Overall, these findings suggest that SGNNs provide a robust and scalable approach for learning from high-dimensional, heterogeneous tabular data, particularly in biomedical settings with limited sample sizes.

Topics & Keywords

Advanced Graph Neural Networks Machine Learning in Healthcare Bioinformatics and Genomic Networks

Publication Details

Published in: Technologies

Volume 14, Issue 2, pp. 84-84

DOI: 10.3390/technologies14020084

Field-Weighted Citation Impact: 0.00

Command Palette

Overcoming the Curse of Dimensionality with Synolitic AI

Authors

Abstract

Topics & Keywords

Publication Details