论文标题
自我监督的3D人类姿势通过零件指导的新颖图像综合估计
Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis
论文作者
论文摘要
相机捕获的人姿势是多种变化来源的结果。监督的3D姿势估计方法的性能是以分配变化(例如形状和外观)为代价的,这可能对解决其他相关任务有用。结果,学到的模型不仅灌输了任务偏置,而且还灌输了数据集偏置,因为它强烈依赖于注释的样本,这对于弱监督的模型也是如此。认识到这一点,我们提出了一个自制的学习框架,以将这些变化与未标记的视频帧相关联。我们利用了人类骨骼的先验知识,并以基于单个零件的2D木偶模型,人姿势关节限制和一组未配对的3D姿势的形式摆姿势。我们可区分的形式化,弥合了3D姿势和空间零件图之间的表示差距,不仅有助于发现可解释的姿势分离,而且还使我们能够在具有不同相机运动的视频上操作。在看不见的野外数据集中的定性结果建立了我们在3D姿势估计和部分分段的主要任务之外跨多个任务的卓越概括。此外,我们展示了最先进的弱监督3D姿势估计在360万和MPI-INF-3DHP数据集上。
Camera captured human pose is an outcome of several sources of variation. Performance of supervised 3D pose estimation approaches comes at the cost of dispensing with variations, such as shape and appearance, that may be useful for solving other related tasks. As a result, the learned model not only inculcates task-bias but also dataset-bias because of its strong reliance on the annotated samples, which also holds true for weakly-supervised models. Acknowledging this, we propose a self-supervised learning framework to disentangle such variations from unlabeled video frames. We leverage the prior knowledge on human skeleton and poses in the form of a single part-based 2D puppet model, human pose articulation constraints, and a set of unpaired 3D poses. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, not only facilitates discovery of interpretable pose disentanglement but also allows us to operate on videos with diverse camera movements. Qualitative results on unseen in-the-wild datasets establish our superior generalization across multiple tasks beyond the primary tasks of 3D pose estimation and part segmentation. Furthermore, we demonstrate state-of-the-art weakly-supervised 3D pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets.