Search for a command to run...
Achieving fully automated and unmanned construction operations requires accurate, real-time multi-scale object detection under various complex site conditions. To address this need, a novel model named Progressive Refined Real-Time Detection Transformer is proposed, based on the lightweight Real-Time Detection Transformer. Specifically, we introduce the concept of Deformable Convolution and design a Fusion-Enhanced Multi-Stream Coordinate Attention module to improve the model’s adaptability to geometric deformations during feature extraction. Furthermore, the Multi-Head Self-Attention originally used in the Attention-based Inductive Feature Interpolation module is replaced with a more computationally efficient Additive Attention, thereby enhancing robustness while reducing complexity. Additionally, we redesign the Cross-Channel Feature Module by proposing a novel Scale-Aware Adaptive Feature Fusion Module to strengthen the fusion of multi-scale features. Extensive experiments conducted on both a self-constructed tunnel boring dataset and the Construction Site Safety Dataset demonstrate the generalization ability and practical value of the proposed model. On the Construction Site Safety Dataset, PRRT-DETR achieves a 6.5 percentage point improvement in precision and a 0.8 percentage point increase in recall while maintaining comparable parameter size and frame rate, validating its effectiveness for real-time object detection in complex environments.
DOI: 10.1117/12.3096128