论文标题
3D人姿势估算
Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation
论文作者
论文摘要
热图表示已经构成了2D人姿势估计系统的基础,但是直到最近才考虑其对3D姿势的概括。这包括2.5D体积热图,其X和Y轴对应于图像空间,Z轴对应于受试者周围的度量深度。为了获得度量标准的预测,这些方法必须包括一个单独的,明确的后处理步骤,以解决比例歧义。此外,它们不能在图像边界之外编码车身关节位置,从而导致姿势估计不完整。我们通过提出公制尺度截断式(Metro)体积热图来解决这些局限性,这些尺寸是在受试者附近的度量3D空间中定义的,而不是与图像空间对齐。我们以端到端的方式训练一个完全横向的网络从单眼RGB估算此类热图。热图维度的这种重新解释使我们能够估算完整的度量尺度姿势,而无需测试焦距或人距离的测试时间知识,而无需依靠人体测量法进行后处理。此外,随着图像空间与热图空间分离,网络可以学会推理超出图像边界的关节。使用Resnet-50没有任何其他学习层,我们可以在Human36M和MPI-INF-3DHP基准上获得最先进的结果。由于我们的方法简单快捷,因此它可以成为实时自上而下的多人姿势估计系统的有用组件。我们公开使用代码以促进进一步的研究(请参阅https://vision.rwth-aachen.de/metro-pose3d)。
Heatmap representations have formed the basis of 2D human pose estimation systems for many years, but their generalizations for 3D pose have only recently been considered. This includes 2.5D volumetric heatmaps, whose X and Y axes correspond to image space and the Z axis to metric depth around the subject. To obtain metric-scale predictions, these methods must include a separate, explicit post-processing step to resolve scale ambiguity. Further, they cannot encode body joint positions outside of the image boundaries, leading to incomplete pose estimates in case of image truncation. We address these limitations by proposing metric-scale truncation-robust (MeTRo) volumetric heatmaps, whose dimensions are defined in metric 3D space near the subject, instead of being aligned with image space. We train a fully-convolutional network to estimate such heatmaps from monocular RGB in an end-to-end manner. This reinterpretation of the heatmap dimensions allows us to estimate complete metric-scale poses without test-time knowledge of the focal length or person distance and without relying on anthropometric heuristics in post-processing. Furthermore, as the image space is decoupled from the heatmap space, the network can learn to reason about joints beyond the image boundary. Using ResNet-50 without any additional learned layers, we obtain state-of-the-art results on the Human3.6M and MPI-INF-3DHP benchmarks. As our method is simple and fast, it can become a useful component for real-time top-down multi-person pose estimation systems. We make our code publicly available to facilitate further research (see https://vision.rwth-aachen.de/metro-pose3d).