Search for a command to run...
Abstract Driven by the rapid advancement of unmanned aerial vehicle technology, aerial image object detection has found extensive applications in domains such as intelligent transportation and emergency response. Although Transformer-based detection frameworks have achieved remarkable progress, existing methods still encounter challenges when processing small objects in complex aerial scenarios, including limited receptive fields, insufficient multi-scale feature fusion, and inadequate cross-scale information aggregation capabilities. To address these limitations, this paper proposes CAB-DETR, a lightweight detection algorithm designed to comprehensively enhance small object detection performance. First, this work introduces a multi-dilation attention block in the backbone network, which combines parallel multi-dilation convolution with position-aware attention mechanisms to dynamically expand the receptive field and enhance feature representation for small objects. Second, a Bidirectional Synergistic Fusion Pyramid Network (BDSFPN) is designed that employs novel Symbiotic Gating Fusion Module (SGFM) and Hierarchical Collaborative Fusion Module (HCFM) to systematically optimize information flow in the feature pyramid, thereby maximally preserving and utilizing small object features. Finally, an efficient Self-adaptive Orthogonal Attention Transformer Layer (SOAT-Layer) is proposed that performs sparse attention computation in reduced-dimensional space, significantly decreasing computational overhead while improving the model’s detection capability for small objects. Extensive experiments on the VisDrone2019 dataset demonstrate that compared to the baseline model, CAB-DETR reduces model parameters by 14.1% while improving mAP 0.5 from 46.0% to 49.7% (+3.7%) and mAP <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:msub> <mml:mrow/> <mml:mrow> <mml:mn>0.5</mml:mn> <mml:mo>:</mml:mo> <mml:mn>0.95</mml:mn> </mml:mrow> </mml:msub> </mml:mrow> </mml:math> from 27.7% to 30.8% (+3.1%). Furthermore, the method exhibits strong generalization performance on Seaperson, Tinyperson, and TT100k datasets, further validating the effectiveness and robustness of the proposed algorithm.
Published in: Measurement Science and Technology
Volume 37, Issue 13, pp. 135401-135401