Search for a command to run...
Biomedical image analysis is a cornerstone of modern healthcare, enabling early disease detection, treatment planning, and precision diagnostics. However, traditional deep learning methods — including convolutional neural networks and standalone transformers — often suffer from limitations such as inadequate structural understanding, poor semantic consistency, and the inability to effectively integrate multimodal biomedical information. To address these challenges, we propose HGVT-MCF (Hierarchical Graph Vision Transformer with Multimodal Contrastive Fusion), a novel framework that unifies graph-based spatial reasoning, transformer-driven global attention, and contrastive multimodal fusion to deliver semantically consistent and biologically meaningful image analysis. The proposed architecture models complex topological relationships in medical images through hierarchical graph construction, while the transformer encoder captures longrange contextual dependencies. A contrastive fusion mechanism aligns visual, genomic, and clinical embeddings in a shared latent space, significantly enhancing predictive power. Experimental evaluations on multiple benchmark datasets (BRATS, CAMELYON, TCGA, and PAIP) demonstrate that HGVT-MCF outperforms state-of-the-art approaches, achieving superior segmentation accuracy (Dice <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\approx 94.5 \%$</tex>, IoU <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\approx 88.6\%)$</tex> and classification performance (AUC <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\approx 0.98$</tex>). Moreover, the quality of qualitative results demonstrates better boundary precision, interpretability and robustness of the framework which confirms clinical relevance of the framework. This publication is an important step towards the next generation, multimodal, context-aware biomedical artificial intelligence systems to understand disease reliably, diagnose diseases and make decisions.