loading page

Estimating 3D hand poses from single RGB images for industrial robot teleoperation
  • Digang Sun,
  • Ping Zhang
Digang Sun
School of Computer Science and Engineering, South China University of Technology

Corresponding Author:[email protected]

Author Profile
Ping Zhang
School of Computer Science and Engineering, South China University of Technology

Abstract

3D hand pose estimation from single RGB images is challenging because self-occlusion and the absence of depth make it difficult to regress relative depth between hand joints and to produce biomechanically feasible hand poses. To address these issues, we propose a Prior-knowledge Aware and Mesh-Supervised Network (PAMSNet) to integrate the knowledge implied in the hand's articulated structure and that contained in hand meshes. We explore and interpret the knowledge from a novel perspective inspired by cognitive psychology and forge it into implicit and explicit categories. The former is difficult to be formulated and should be learned from data while the latter can be embedded in loss functions. We estimate 3D poses by fusing the hand's 2D pose and texture features. Hand meshes produced by a parameterized hand model are employed as a regularizer to optimize feature extraction. Furthermore, an extended 128-joint hand skeleton model is proposed to generate denser heatmaps to provide approximately mask-aware spatial attention. Experimental results show that our method is competitive with the state-of-the-art on two public datasets and is superior in generalization ability, with a more efficient architecture. Finally, we apply 3D hand poses to control the moving direction and orientation of the robot end-effector (EE).
11 Apr 2024Submitted to TechRxiv
17 Apr 2024Published in TechRxiv