直播时间嵌入的3D人体姿势和形状估计

论文标题

直播时间嵌入的3D人体姿势和形状估计

Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation

论文作者

Wang, Zhouping, Ostadabbas, Sarah

论文摘要

时间序列内的3D人体姿势和形状估计对于理解人类行为至关重要。尽管近年来人类姿势估计取得了重大进展，这些进展通常基于单个图像或视频，但考虑到其对实时输出和时间一致性的特殊要求，实时视频中的人类运动估计仍然是一个很少的触摸区域。为了解决这个问题，我们提出了一个时间嵌入的3D人体姿势和形状估计（Tepose）方法，以提高实时流视频中姿势估计的准确性和时间一致性。 Tepose使用以前的预测作为反馈错误的桥梁，以在当前帧中更好地估计，并了解数据框架和历史上的预测之间的对应关系。多尺度时空图形卷积网络被视为使用数据集的动作判别器，用于使用没有任何3D标签的数据集。我们提出了一个顺序数据加载策略，以满足实时流的特殊起始数据处理要求。我们通过广泛的实验证明了每个提出的模块的重要性。结果表明，多孔在具有最先进的性能的广泛使用的人姿势基准上的有效性。

3D Human body pose and shape estimation within a temporal sequence can be quite critical for understanding human behavior. Despite the significant progress in human pose estimation in the recent years, which are often based on single images or videos, human motion estimation on live stream videos is still a rarely-touched area considering its special requirements for real-time output and temporal consistency. To address this problem, we present a temporally embedded 3D human body pose and shape estimation (TePose) method to improve the accuracy and temporal consistency of pose estimation in live stream videos. TePose uses previous predictions as a bridge to feedback the error for better estimation in the current frame and to learn the correspondence between data frames and predictions in the history. A multi-scale spatio-temporal graph convolutional network is presented as the motion discriminator for adversarial training using datasets without any 3D labeling. We propose a sequential data loading strategy to meet the special start-to-end data processing requirement of live stream. We demonstrate the importance of each proposed module with extensive experiments. The results show the effectiveness of TePose on widely-used human pose benchmarks with state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题