CAB-DETR: a lightweight small object detection algorithm based on cross-scale attention and bidirectional feature fusion

20260 citationsJournal Articlehybrid Open Access

Authors

Ruxin Gao · Henan Polytechnic University

wenxuan fan · Henan Polytechnic University

Jing Li · Institute of Geographic Sciences and Natural Resources Research

Haiquan Jin · Digital China Health (China)

Abstract

Abstract Driven by the rapid advancement of unmanned aerial vehicle technology, aerial image object detection has found extensive applications in domains such as intelligent transportation and emergency response. Although Transformer-based detection frameworks have achieved remarkable progress, existing methods still encounter challenges when processing small objects in complex aerial scenarios, including limited receptive fields, insufficient multi-scale feature fusion, and inadequate cross-scale information aggregation capabilities. To address these limitations, this paper proposes CAB-DETR, a lightweight detection algorithm designed to comprehensively enhance small object detection performance. First, this work introduces a multi-dilation attention block in the backbone network, which combines parallel multi-dilation convolution with position-aware attention mechanisms to dynamically expand the receptive field and enhance feature representation for small objects. Second, a Bidirectional Synergistic Fusion Pyramid Network (BDSFPN) is designed that employs novel Symbiotic Gating Fusion Module (SGFM) and Hierarchical Collaborative Fusion Module (HCFM) to systematically optimize information flow in the feature pyramid, thereby maximally preserving and utilizing small object features. Finally, an efficient Self-adaptive Orthogonal Attention Transformer Layer (SOAT-Layer) is proposed that performs sparse attention computation in reduced-dimensional space, significantly decreasing computational overhead while improving the model’s detection capability for small objects. Extensive experiments on the VisDrone2019 dataset demonstrate that compared to the baseline model, CAB-DETR reduces model parameters by 14.1% while improving mAP 0.5 from 46.0% to 49.7% (+3.7%) and mAP <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:msub> <mml:mrow/> <mml:mrow> <mml:mn>0.5</mml:mn> <mml:mo>:</mml:mo> <mml:mn>0.95</mml:mn> </mml:mrow> </mml:msub> </mml:mrow> </mml:math> from 27.7% to 30.8% (+3.1%). Furthermore, the method exhibits strong generalization performance on Seaperson, Tinyperson, and TT100k datasets, further validating the effectiveness and robustness of the proposed algorithm.

Topics & Keywords

Advanced Neural Network Applications Advanced Data and IoT Technologies Multimodal Machine Learning Applications

UN Sustainable Development Goals

Life below water

Publication Details

Published in: Measurement Science and Technology

Volume 37, Issue 13, pp. 135401-135401

DOI: 10.1088/1361-6501/ae54bc

Field-Weighted Citation Impact: 0.00