Search for a command to run...
Human pose estimation (HPE) aims to localize human keypoints from visual inputs, which faces persistent challenges in balancing high accuracy with computational efficiency in resource constrained and real-time scenarios. To address these challenges, we propose a lightweight method named CDO-POSE based on an improved YOLOv11. Specifically, we first introduce the Context Anchor Attention (CAA) module, which is composed of three convolutional layers and two bottleneck modules to enhance feature representation while maintaining parameter efficiency. Building on this, to address the limited precision of traditional nearest-neighbor upsampling, we incorporate the Dynamic Sampling (DySample) method, which adaptively adjusts the sampling strategy according to feature importance, thereby improving upsampling accuracy. Furthermore, to align the training objective more closely with the goal of precise pose estimation, we employ the Object Keypoint Similarity Loss (OKS-Loss), which provides a more discriminative evaluation of keypoint localization errors. The experiments on MS COCO2017 and CrowdPose datasets demonstrate that our model achieves almost the same accuracy as YOLOv11s-pose with significantly fewer parameters. Moreover, the model achieves 39.79 FPS and 29.23 FPS for inference at 480p and 720p, respectively, on the NVIDIA Jetson Orin Nano, suggesting that it is suitable for real-time deployment on edge devices.