Search for a command to run...
Object detection in unmanned aerial vehicle (UAV) imagery remains a crucial yet challenging task due to complex backgrounds, large scale variations, and the prevalence of small objects. Visible-spectrum images lack robustness under all-weather and all-illumination conditions; by contrast, multispectral sensing provides complementary cues (e.g., thermal signatures) that improve detection robustness. However, existing multispectral solutions often incur high computational costs and are therefore difficult to deploy on resource-constrained UAV platforms. To address these issues, SG-YOLO is proposed, a lightweight and efficient multispectral object detection framework that aims to balance accuracy and efficiency. First, a Spectral Gated Downsampling Stem (SGDS) is designed, in which grouped convolutions and a gating mechanism are employed at the early stage of the network to extract band-specific features, thereby maximizing spectral complementarity while minimizing redundancy. Second, a Spectral–Spatial Iterative Attention Fusion (SSIAF) module is introduced, in which spectral-wise (channel) attention and spatial-wise attention are iteratively coupled and cascaded in a multi-scale manner to jointly model cross-band dependencies and spatial saliency, thereby aggregating high-level semantic information while suppressing redundant spectral responses. Finally, a Spatial–Channel Synergistic Fusion (SCSF) module is designed to enhance multi-scale and cross-channel feature integration in the neck. Experiments on the MODA dataset show that SG-YOLOs achieves 72.4% mAP50, outperforming the baseline by 3.2%. Moreover, compared with a range of mainstream one-stage detectors and multispectral detection methods, SG-YOLO delivers the best overall performance, providing an effective solution for UAV object detection while maintaining a favorable trade-off between model size and detection accuracy.