AgileFormer: Spatially agile and scalable transformer for medical image segmentation

20254 citationsJournal Articlehybrid Open Access

Authors

Peijie Qiu · Mallinckrodt (United States)

Sayantan Kumar · Washington University in St. Louis

Soumyendu Sekhar Ghosh · Washington University in St. Louis

Aristeidis Sotiras · Washington University in St. Louis

Abstract

In the past decades, deep neural networks, particularly convolutional neural networks, have achieved state-of-the-art performance in various medical image segmentation tasks. Recently, the introduction of vision transformers (ViTs) has significantly altered the landscape of deep segmentation models, due to their ability to capture long-range dependencies. However, we argue that the current design of the ViT-based UNet (ViT-UNet) segmentation models is limited in handling the heterogeneous appearance ( e.g., varying shapes and sizes) of target objects that are commonly encountered in medical image segmentation tasks. To tackle this limitation, we present a structured approach to introduce spatially dynamic components into a ViT-UNet. This enables the model to capture features of target objects with diverse appearances effectively. This is achieved by three main components: (i) deformable patch embedding; (ii) spatially dynamic multi-head attention; (iii) multi-scale deformable positional encoding. These components are integrated into a novel architecture, termed AgileFormer , enabling more effective capture of heterogeneous objects at every stage of a ViT-UNet. Experiments in three segmentation tasks using publicly available datasets (Synapse multi-organ, ACDC cardiac, and Decathlon brain tumor datasets) demonstrated the effectiveness of AgileFormer for 2D and 3D segmentation tasks. Remarkably, our AgileFormer sets a new state-of-the-art performance with a Dice Score of 85.74% and 87.43 % for 2D and 3D multi-organ segmentation on Synapse without significant computational overhead. Our code is avaliable at https://github.com/sotiraslab/AgileFormer . • AgileFormer captures spatially varying features in medical image segmentation. • Patch embedding and positional encoding are as crucial as self-attention in ViT-UNet. • AgileFormer achieves SOTA on multi-organ, cardiac, and brain tumor segmentation. • AgileFormer scales well, enhancing segmentation accuracy as model size increases.

Topics & Keywords

Advanced Neural Network Applications Medical Image Segmentation Techniques AI in cancer detection

Publication Details

Published in: Biomedical Signal Processing and Control

Volume 112, pp. 108842-108842

DOI: 10.1016/j.bspc.2025.108842

Field-Weighted Citation Impact: 5.07

Command Palette

AgileFormer: Spatially agile and scalable transformer for medical image segmentation

Authors

Abstract

Topics & Keywords

Publication Details