论文标题
慢动作问题:慢动作增强的网络,用于弱监督的时间动作定位
Slow Motion Matters: A Slow Motion Enhanced Network for Weakly Supervised Temporal Action Localization
论文作者
论文摘要
弱监督的时间动作本地化(WTAL)旨在将动作定位在没有薄弱的监督信息(例如视频级标签)的未修剪视频中。大多数现有模型都以固定的时间尺度处理所有输入视频。但是,此类模型对动作的步伐与``正常''速度,尤其是慢速动作动作实例不同的动作不敏感,这些动作的速度与速度慢得多的速度相比,其速度慢得多。在这里出现了慢动作的模糊问题:很难探索从Vide at frust n formal n frustans'的Vide vide sepos from v vide n forman'speed'smotion sloge speed'speed'speed'speed'speed'speed'speed''在本文中,我们提出了一个称为慢动作增强网络(SMEN)的新型框架,以通过补偿其对慢动作作用段的敏感性来提高WTAL网络的能力。提出的SMEN包括一个挖掘模块和一个定位模块。采矿模块通过利用正常运动和慢动作之间的关系来生成面具以开采与慢动作相关的特征。而定位模块则利用开采的慢动作特征作为互补信息来改善时间动作的定位结果。我们提出的框架很容易被现有的WTAL网络改编,并使它们对缓慢动作更敏感。进行了三个基准测试的广泛实验,这证明了我们提出的框架的高性能。
Weakly supervised temporal action localization (WTAL) aims to localize actions in untrimmed videos with only weak supervision information (e.g. video-level labels). Most existing models handle all input videos with a fixed temporal scale. However, such models are not sensitive to actions whose pace of the movements is different from the ``normal" speed, especially slow-motion action instances, which complete the movements with a much slower speed than their counterparts with a normal speed. Here arises the slow-motion blurred issue: It is hard to explore salient slow-motion information from videos at ``normal" speed. In this paper, we propose a novel framework termed Slow Motion Enhanced Network (SMEN) to improve the ability of a WTAL network by compensating its sensitivity on slow-motion action segments. The proposed SMEN comprises a Mining module and a Localization module. The mining module generates mask to mine slow-motion-related features by utilizing the relationships between the normal motion and slow motion; while the localization module leverages the mined slow-motion features as complementary information to improve the temporal action localization results. Our proposed framework can be easily adapted by existing WTAL networks and enable them be more sensitive to slow-motion actions. Extensive experiments on three benchmarks are conducted, which demonstrate the high performance of our proposed framework.