BiMS-Pose: Enhancing Human Pose Estimation in Orchard Spraying Scenarios via Bidirectional Multi-Scale Collaboration

20260 citationsJournal Articlegold Open Access

Authors

Yuhang Ren · East China Jiaotong University

Zichen Yang · East China Jiaotong University

Hanxin Chen · East China Jiaotong University

Zhuochao Chen · East China Jiaotong University

Daojin Yao · East China Jiaotong University

Abstract

Most 2D human pose estimation frameworks utilize static designs for multi-scale feature fusion, where information from various scales is integrated using fixed weights. A drawback of these approaches is that they often lead to localization biases in complex scenarios. This paper addresses the issues of multi-scale feature mismatch and joint localization biases in pose estimation. From the perspective of feature processing, multi-scale weights must be adapted to the size and position of joints, while joint predictions should adhere to human anatomical constraints. Existing methods lack effective dynamic adaptation, structural constraints, and bidirectional complementarity between high-level semantics and low-level details. They often experience localization biases in occluded scenarios, and the peaks of their heatmaps demonstrate insufficient consistency with the actual positions of the joints. Through theoretical analysis, we identify the causes of performance gaps and propose directions for narrowing them. We propose Bidirectional Multi-Scale Collaborative Pose Estimation (BiMS-Pose), a framework that introduces dynamic weights to adjust feature proportions, establishes bidirectional topological constraints for joint relationships, and integrates a bidirectional attention flow. The framework filters key information from three dimensions, adjusts filtering strategies in real time, and is enhanced by heatmap optimization to improve localization accuracy. Extensive experiments conducted on COCO, MPII, and our self-built Orchard Spraying Pose Dataset (OSPD) demonstrate the effectiveness of BiMS-Pose. In general scenarios, it achieves a significant 1.2 percentage-point increase in average precision (AP) on the COCO val2017 dataset compared to ViTPose while utilizing the same backbone. In agricultural orchard spraying scenarios, it effectively addresses interference factors such as changes in illumination, occlusion, and varying shooting distances, achieving 75.4% average precision (AP) and 90.7% percent of correct keypoints (PCKh@0.5) on the OSPD dataset. Additionally, it maintains an average frame rate of 18.3 FPS on embedded devices, effectively meeting the requirements for real-time monitoring. This highlights the model’s potential for precise, stable, and practical human pose estimation in both general and agricultural application scenarios.

Topics & Keywords

Human Pose and Action Recognition Robot Manipulation and Learning 3D Shape Modeling and Analysis

Publication Details

Published in: Agriculture

Volume 16, Issue 5, pp. 606-606

DOI: 10.3390/agriculture16050606

Field-Weighted Citation Impact: 0.00

Command Palette

BiMS-Pose: Enhancing Human Pose Estimation in Orchard Spraying Scenarios via Bidirectional Multi-Scale Collaboration

Authors

Abstract

Topics & Keywords

Publication Details