Search for a command to run...
Vision Transformer (ViT) models are well known for effectively capturing global contextual information through self-attention. In contrast, ConvNeXt’s hierarchical convolutional inductive bias enables the extraction of robust multi-scale features at lower computational and memory cost, making it suitable for deployment in systems with limited annotation and constrained resources. Accordingly, a multi-scale UNet architecture based on a ConvNeXt backbone is proposed for brain tumor segmentation; it is equipped with a spatial latent module and Reverse Attention (RA)-guided skip connections. This framework jointly models long-range context and delineates reliable boundaries. Magnetic resonance images drawn from the BraTS 2021, 2023, and 2024 datasets serve as case studies for evaluating brain tumor segmentation performance. The incorporated multi-scale features notably improve the segmentation of small enhancing regions and peripheral tumor boundaries, which are frequently missed by single-scale baselines. On BraTS 2021, the model achieves a Dice similarity coefficient (DSC) of 0.8956 and a mean intersection over union (IoU) of 0.8122, with a sensitivity of 0.8761, a specificity of 0.9964, and an accuracy of 0.9878. On BraTS 2023, it attains a DSC of 0.9235 and an IoU of 0.8592, with a sensitivity of 0.9037, a specificity of 0.9977, and an accuracy of 0.9904. On BraTS 2024, it yields a DSC of 0.9225 and an IoU of 0.8575, with a sensitivity of 0.8989, a specificity of 0.9979, and an accuracy of 0.9903. Overall, the segmentation results provide spatially explicit contours that support lesion-area estimation, precise boundary delineation, and slice-wise longitudinal assessment.