Search for a command to run...
Segmentation of volumetric medical images is important for computer-aided diagnosis, treatment planning, and monitoring for disease progression. However, current approaches using CNNs may struggle to capture long-range dependencies across three-dimensional position space, and they may lack fine anatomical detail due to down-sampling techniques. A TransVol-Net, a 3D transformer-based segmentation framework has been introduced with the following components: 3D patch embedding, a hybrid convolution - transformer encoder-decoder backbone, and a multi-scale fusion refinement head. The model architecture uses window-based multi-head self-attention for fast global context modeling while leveraging convolution layers to maintain local texture information. TransVol-Net is evaluated on the BraTS dataset, where it achieves mean Dice scores of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\geq 91.0 {\%}$</tex> (WT), <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\geq 88.0 {\%}$</tex> (TC), and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\geq 85.0 {\%}$</tex> (ET) across all the tumors, and exceeds other, state-of-the-art methods for 3D U-Net, TransBTS, or Swin UNTR. The results further evidence increased sensitivity for small lesions and more fluid boundary delineation for tumor voxels compared with other state-of-the-art segmentation models. In conclusion, our findings for TransVol-Net demonstrate a reformulated model that provides a more scalable and clinically acceptable avenue for volumetric segmentation that has applicability to CT, MRI, and PET-CT imaging workflows.