loading page

Multi-Anchor Offset Human Representation based Coarse-to-Fine Diffusion Model for 3D Human Pose Estimation in Images
  • +2
  • Qianxing Li,
  • Dehui Kong,
  • Jinghua Li,
  • Dongpan Chen,
  • Baocai Yin
Qianxing Li

Corresponding Author:[email protected]

Author Profile
Dehui Kong
Jinghua Li
Dongpan Chen
Baocai Yin


3D human pose estimation (3DHPE) in images aims at estimating 3D joint positions from images. The state-of-theart for 3DHPE is dominated by deep learning model whose accuracy is obviously affected by loss functions. The existing 3DHPE methods usually define the loss function as the error measured by Euclidean distance between the locations of the predicted joints and the ground truth of joints, which confuses two different kinds of errors: the error caused by different pose structures and the others. But in fact, the characteristics of these two kinds of errors are obviously different and should not be processed equally, and consequently decoupling these two kinds of errors and optimizing them separately is one of the ways to improve the 3DHPE accuracy. However, The existing human pose representations are not suitable to distinguish these two kinds of errors. In order to tackle this problem, we propose a novel Multi-Anchor Offset human Representation (MAOR) for human pose, which locates the position of each joint using its offsets from a group of selected high-precision joints named as Multi-Anchors. Making use of MAOR, the pose error related to the distortion of spatial structure can be measured independently from other errors, which is helpful to promote the accuracy of pose estimation. We then propose a novel MAOR based coarseto-fine diffusion model (MAOR-DiffPose) for pose estimation, which optimizes different types of errors of poses step by step. Firstly, a MAOR-based Denoising Process (MDP) is devised to explicitly optimize spatial structures of 3D poses by using MAOR to describe poses and improves the inductive learning ability of MAOR-DiffPose by extracting view-independent features. Secondly, a Joint Coordinate based Denoising Process assisted by MAOR (JCDPaM) is devised to expand the input features meaningfully by combining MAOR with the pose representation based on joint coordinate and optimize the joint coordinates of 3D poses with the assistance of MAOR. MAOR-DiffPose realizes accurate 3DHPE by iterating MDP and JCDPaM modules. Comprehensive experimental results on widely used 3DHPE benchmarks Human3.6M and MPI-INF-3DHP show that the proposed method achieves the best performance compared with the state-of-the-art methods.
19 May 2024Submitted to TechRxiv
25 May 2024Published in TechRxiv