Search for a command to run...
Unmanned aerial vehicle (UAV) small object detection faces critical challenges including irreversible geometric detail loss during multi-level downsampling, cross-scale feature distortion from interpolation blur and aliasing, and limited long-range dependency modeling due to constrained receptive fields. To address these limitations, we propose HMF-DEIM (High-Fidelity Multi-Domain Fusion Transformer for UAV Small Object Detection), an end-to-end architecture tailored for UAV small object detection. First, we design a lightweight hierarchical differentiation backbone that removes redundant deepest-layer features (P5) to prevent tiny object information loss, employing Multi-Domain Feature Blending (MDFB) in shallow layers for geometric detail preservation and a Hierarchical Attention-guided Feature Modulation Block (HAFMB) in deep layers for global semantic modeling. Second, we develop a full-chain high-fidelity feature transformation framework comprising Channel-Adaptive Shift Upsampling (CASU) for interpolation-free resolution recovery, Multi-scale Context Alignment Fusion (MCAF) for bridging deep–shallow semantic gaps via bidirectional gating, and Diversified Residual Frequency-aware Downsampling (DRFD) for aliasing suppression through a three-branch parallel architecture. Finally, we devise the FocusFeature module that aligns multi-scale features to a unified scale and employs parallel multi-scale large-kernel depthwise convolutions to capture cross-scale long-range dependencies, generating dual-scale (P3/P4) features balancing details and semantics. Experiments demonstrate that HMF-DEIM outperforms DEIM on VisDrone2019 test by 0.405 mAP50 (+2.1%) and 0.235 mAP50--95 (+1.6%), with a remarkable 21.3% relative improvement in APs for tiny objects, while maintaining real-time inference (465 FPS with TensorRT FP16) on an NVIDIA A100 GPU with only 11.87M parameters and 34.1 GFLOPs. Further validation on AI-TOD v2 and DOTA v1.5 datasets confirms robust generalization across diverse aerial scenarios, making it a practical solution for resource-constrained UAV applications.