Search for a command to run...
In existing RGB-D vision systems, captured depth maps often suffer from artifacts such as holes and noise, which lead to high model complexity and limited accuracy in pose estimation. To solve this issue, this paper proposes an end-to-end joint network architecture for RGB-D depth completion and lightweight pose estimation. Specifically, during the depth completion stage, a Non-Local Spatial Propagation Network (NLSPN) is introduced to enhance the recovery of depth information and object boundary structures through a non-local neighborhood propagation mechanism. Furthermore, a Convolutional Block Attention Module (CBAM) is incorporated into the NLSPN feature extraction process, and an edge-aware loss function is designed to strengthen the network's focus and constraints on texture and boundary details, thereby improving the structural consistency of the completed depth maps. In the pose estimation stage, a teacher-student knowledge distillation framework is constructed. Among them, the teacher model is a high-precision PoseNet. The lightweight student model (LightPoseNet) is trained under the joint supervision of softtarget loss, feature distillation loss, and hard-target loss, enabling a significant reduction in model parameters and computational complexity while maintaining high pose estimation accuracy. This paper conducts experiments on this method, and the information completeness of the depth map and the accuracy of pose estimation have been significantly improved, while exhibiting favorable speed and deployment efficiency, making it suitable for RGB-D vision tasks on resource-constrained platforms.