深入深入研究基于骨架的动作识别，并具有不同的遮挡

论文标题

深入深入研究基于骨架的动作识别，并具有不同的遮挡

Delving Deep into One-Shot Skeleton-based Action Recognition with Diverse Occlusions

论文作者

Peng, Kunyu, Roitberg, Alina, Yang, Kailun, Zhang, Jiaming, Stiefelhagen, Rainer

论文摘要

闭塞是现实世界中不断存在的普遍破坏。特别是对于稀疏的表示，例如人类骨骼，一些遮挡的点可能会破坏几何和时间连续性，从而严重影响结果。然而，从骨骼序列（例如单发操作识别）中对数据筛查识别的研究并未明确考虑阻塞，尽管它们日常普遍存在。在这项工作中，我们明确应对基于骨架的单杆动作识别（SOAR）的身体阻塞。我们主要考虑两个遮挡变体：1）随机闭塞和2）由多种日常对象引起的更现实的遮挡，我们通过将现有的IKEA 3D家具模型投影到具有不同几何参数的3D骨架的摄像机坐标系统中。我们利用所提出的管道将三个流行动作识别数据集的骨骼序列的一部分融合在一起，并从部分遮挡的身体姿势中形成第一个基准。基准的另一个关键特性是日常对象产生的更现实的遮挡，即使在3D骨架的标准识别中，也仅考虑了随机缺少的关节。根据这项新任务，我们重新评估了现有的最新框架以飙升，并进一步引入了Trans4Soar - 一种新的基于变压器的模型，该模型利用了三个数据流和混合注意融合机制来减轻遮挡引起的不良影响。尽管我们的实验表明缺少骨骼部分的精度明显下降，但Trans4Soar的效果较小，Trans4Soar在所有数据集上的表现都优于其他架构。尽管我们专门针对闭塞，但Trans4Soar在不闭塞的情况下还可以在标准SOAR中产生最先进的方法，在NTU-120上超过了最佳发表的方法2.85％。

Occlusions are universal disruptions constantly present in the real world. Especially for sparse representations, such as human skeletons, a few occluded points might destroy the geometrical and temporal continuity critically affecting the results. Yet, the research of data-scarce recognition from skeleton sequences, such as one-shot action recognition, does not explicitly consider occlusions despite their everyday pervasiveness. In this work, we explicitly tackle body occlusions for Skeleton-based One-shot Action Recognition (SOAR). We mainly consider two occlusion variants: 1) random occlusions and 2) more realistic occlusions caused by diverse everyday objects, which we generate by projecting the existing IKEA 3D furniture models into the camera coordinate system of the 3D skeletons with different geometric parameters. We leverage the proposed pipeline to blend out portions of skeleton sequences of the three popular action recognition datasets and formalize the first benchmark for SOAR from partially occluded body poses. Another key property of our benchmark are the more realistic occlusions generated by everyday objects, as even in standard recognition from 3D skeletons, only randomly missing joints were considered. We re-evaluate existing state-of-the-art frameworks for SOAR in the light of this new task and further introduce Trans4SOAR - a new transformer-based model which leverages three data streams and mixed attention fusion mechanism to alleviate the adverse effects caused by occlusions. While our experiments demonstrate a clear decline in accuracy with missing skeleton portions, this effect is smaller with Trans4SOAR, which outperforms other architectures on all datasets. Although we specifically focus on occlusions, Trans4SOAR additionally yields state-of-the-art in the standard SOAR without occlusion, surpassing the best published approach by 2.85% on NTU-120.

下载PDF全文

下载文献需遵守相关版权规定

论文标题