注意力驱动的身体姿势编码人类活动识别

论文标题

注意力驱动的身体姿势编码人类活动识别

Attention-Driven Body Pose Encoding for Human Activity Recognition

论文作者

Debnath, B, O'brien, M, Kumar, S, Behera, A

论文摘要

本文提出了一种基于注意力的新型身体姿势，以编码人类活动识别，它呈现出丰富的人体姿势的表示。丰富的数据补充了3D主体联合位置数据并改善了模型性能。在本文中，我们提出了一种新颖的方法，该方法从给定的3D身体接头序列中学习增强的特征表示。为了实现此编码，该方法利用了1）空间流，该空间流在每个时间点编码各个身体关节之间的空间关系，以学习空间结构，涉及不同身体关节的空间分布的空间结构2）一个时间流，该时间流了解整个序列中单个身体关节的时间变化，以呈现时间序列，以呈现时间增强的表示。之后，这两个姿势流与多头注意机制融合在一起。％适用于神经机器翻译。我们还使用Inpection-Resnet-V2模型与多头注意力和双向长期记忆（LSTM）网络相结合的Inpection-Resnet-V2模型从RGB视频流中捕获上下文信息。此外，我们通过多头注意机制提高了性能的我们。最后，将RGB视频流与融合的身体姿势流相结合，为有效的人类活动识别提供了一种新颖的端到端深层模型。

This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve this encoding, the approach exploits 1) a spatial stream which encodes the spatial relationship between various body joints at each time point to learn spatial structure involving the spatial distribution of different body joints 2) a temporal stream that learns the temporal variation of individual body joints over the entire sequence duration to present a temporally enhanced representation. Afterwards, these two pose streams are fused with a multi-head attention mechanism. % adapted from neural machine translation. We also capture the contextual information from the RGB video stream using a Inception-ResNet-V2 model combined with a multi-head attention and a bidirectional Long Short-Term Memory (LSTM) network. %Moreover, we whose performance is enhanced through the multi-head attention mechanism. Finally, the RGB video stream is combined with the fused body pose stream to give a novel end-to-end deep model for effective human activity recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题