gimo：在上下文中注视着信息的人类运动预测

论文标题

gimo：在上下文中注视着信息的人类运动预测

GIMO: Gaze-Informed Human Motion Prediction in Context

论文作者

Zheng, Yang, Yang, Yanchao, Mo, Kaichun, Li, Jiaman, Yu, Tao, Liu, Yebin, Liu, C. Karen, Guibas, Leonidas J.

论文摘要

预测人类运动对于辅助机器人和AR/VR应用至关重要，在这种情况下，与人类的互动需要安全舒适。同时，准确的预测取决于理解场景上下文和人类意图。即使许多作品研究场景 - 意识到人类的运动预测，但由于缺乏以自我为中心的观点，这些观点揭示了人类意图以及运动和场景的多样性有限，因此后者在很大程度上却没有得到充实的影响。为了减少差距，我们提出了一个大规模的人类运动数据集，该数据集提供高质量的身体姿势序列，场景扫描以及以自我为中心的视图，目光凝视，这是推断人类意图的代孕。通过使用惯性传感器进行运动捕获，我们的数据收集与特定场景无关，这进一步增强了从主题中观察到的运动动力学。我们对利用眼睛目光进行以自我为中心的人类运动预测的优势进行了广泛的研究，并进行了各种最新的架构。此外，为了实现目光的全部潜力，我们提出了一种新型的网络体系结构，可以在目光和运动分支之间进行双向交流。我们的网络在拟议的数据集上实现了人类运动预测的最高性能，这要归功于眼睛凝视的意图信息以及该动作调制的DeNocied Ceaze特征。代码和数据可以在https://github.com/y-zheng18/gimo上找到。

Predicting human motion is critical for assistive robots and AR/VR applications, where the interaction with humans needs to be safe and comfortable. Meanwhile, an accurate prediction depends on understanding both the scene context and human intentions. Even though many works study scene-aware human motion prediction, the latter is largely underexplored due to the lack of ego-centric views that disclose human intent and the limited diversity in motion and scenes. To reduce the gap, we propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, as well as ego-centric views with the eye gaze that serves as a surrogate for inferring human intent. By employing inertial sensors for motion capture, our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects. We perform an extensive study of the benefits of leveraging the eye gaze for ego-centric human motion prediction with various state-of-the-art architectures. Moreover, to realize the full potential of the gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches. Our network achieves the top performance in human motion prediction on the proposed dataset, thanks to the intent information from eye gaze and the denoised gaze feature modulated by the motion. Code and data can be found at https://github.com/y-zheng18/GIMO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题