SWTF：基于无人机的活动识别的稀疏加权时间融合

论文标题

SWTF：基于无人机的活动识别的稀疏加权时间融合

SWTF: Sparse Weighted Temporal Fusion for Drone-Based Activity Recognition

论文作者

Yadav, Santosh Kumar, Pahwa, Esha, Luthra, Achleshwar, Tiwari, Kamlesh, Pandey, Hari Mohan, Corcoran, Peter

论文摘要

在过去的几年中，基于无人机摄像机的人类活动识别（HAR）受到了计算机视觉研究社区的极大关注。强大而有效的HAR系统在视频监视，人群行为分析，体育分析和人类计算机互动等领域具有关键作用。使它具有挑战性的是复杂的姿势，了解不同的观点以及动作发生的环境情景。为了解决这种复杂性，在本文中，我们提出了一个新型的稀疏加权时间融合（SWTF）模块，以利用稀疏采样的视频框架来获得全局加权的时间融合结果。提出的SWTF分为两个组件。首先，一个稀疏采样一组帧的时间段网络。其次，加权的时间融合，结合了来自光流的特征图的融合，以及原始的RGB图像。接下来是基本网络，它包括一个卷积神经网络模块以及为我们提供活动识别的完全连接的层。 SWTF网络可以用作现有深层CNN体系结构的插件模块，以优化它们来通过消除单独的时间流的需求来学习时间信息。已经对三个公开可用的基准数据集进行了评估，即Okutama，Mod20和Drone-Action。所提出的模型的准确度为72.76％，92.56％和78.86％的数据集，从而超过了先前的最新性能。

Drone-camera based human activity recognition (HAR) has received significant attention from the computer vision research community in the past few years. A robust and efficient HAR system has a pivotal role in fields like video surveillance, crowd behavior analysis, sports analysis, and human-computer interaction. What makes it challenging are the complex poses, understanding different viewpoints, and the environmental scenarios where the action is taking place. To address such complexities, in this paper, we propose a novel Sparse Weighted Temporal Fusion (SWTF) module to utilize sparsely sampled video frames for obtaining global weighted temporal fusion outcome. The proposed SWTF is divided into two components. First, a temporal segment network that sparsely samples a given set of frames. Second, weighted temporal fusion, that incorporates a fusion of feature maps derived from optical flow, with raw RGB images. This is followed by base-network, which comprises a convolutional neural network module along with fully connected layers that provide us with activity recognition. The SWTF network can be used as a plug-in module to the existing deep CNN architectures, for optimizing them to learn temporal information by eliminating the need for a separate temporal stream. It has been evaluated on three publicly available benchmark datasets, namely Okutama, MOD20, and Drone-Action. The proposed model has received an accuracy of 72.76%, 92.56%, and 78.86% on the respective datasets thereby surpassing the previous state-of-the-art performances by a significant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题