Search for a command to run...
In order to address challenges in small object recognition for remote sensing imagery—including high model complexity, overfitting with small samples, and insufficient cross-scenario generalization—this study proposes CAGM-Seg, a lightweight recognition model integrating multi-attention mechanisms. The model systematically enhances the U-Net architecture: First, the encoder adopts a pre-trained MobileNetV3-Large as the backbone network, incorporating a coordinate attention mechanism to strengthen spatial localization of min targets. Second, an attention gating module is introduced in skip connections to achieve adaptive fusion of cross-level features. Finally, the decoder fully employs depthwise separable convolutions to significantly reduce model parameters. This design embodies a symmetry-aware philosophy, which is reflected in two aspects: the structural symmetry between the encoder and decoder facilitates multi-scale feature fusion, while the coordinate attention mechanism performs symmetric decomposition of spatial context (i.e., along height and width directions) to enhance the perception of geometrically regular small targets. Regarding training strategy, a hybrid loss function combining Dice Loss and Focal Loss, coupled with the AdamW optimizer, effectively enhances the model’s sensitivity to small objects while suppressing overfitting. Experimental results on the Xingtai black and odorous water body identification task demonstrate that CAGM-Seg outperforms comparison models in key metrics including precision (97.85%), recall (98.08%), and intersection-over-union (96.01%). Specifically, its intersection-over-union surpassed SegNeXt by 11.24 percentage points and PIDNet by 8.55 percentage points; its F1 score exceeded SegFormer by 2.51 percentage points. Regarding model efficiency, CAGM-Seg features a total of 3.489 million parameters, with 517,000 trainable parameters—approximately 80% fewer than the baseline U-Net—achieving a favorable balance between recognition accuracy and computational efficiency. Further cross-task validation demonstrates the model’s robust cross-scenario adaptability: it achieves 82.77% intersection-over-union and 90.57% F1 score in landslide detection, while maintaining 87.72% precision and 86.48% F1 score in cloud detection. The main contribution of this work is the effective resolution of key challenges in few-shot remote sensing small-object recognition—notably inadequate feature extraction and limited model generalization—via the strategic integration of multi-level attention mechanisms within a lightweight architecture. The resulting model, CAGM-Seg, establishes an innovative technical framework for real-time image interpretation under edge-computing constraints, demonstrating strong potential for practical deployment in environmental monitoring and disaster early warning systems.