论文标题
直接到这一点:通过使用文本数据通过加强学习的快速视频
Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data
论文作者
论文摘要
已发布的视觉数据数量的迅速增加以及用户的有限时间带来了对未修剪视频的需求,以产生传达相同信息的较短版本。尽管通过摘要方法取得了显着的进展,但其中大多数只能选择一些帧或脱脂,这会产生视觉差距并打破视频上下文。在本文中,我们提出了一种基于加强学习公式的新方法,以加速教学视频。我们的方法可以自适应地选择与传达信息无关的帧,而无需在最终视频中创建差距。我们的代理在文本上和视觉上都以视觉为导向,以选择要删除以缩小输入视频的帧。此外,我们提出了一个新颖的网络,称为视觉引导的文档注意网络(VDAN),能够生成一个高度歧视的嵌入空间,以表示文本和视觉数据。我们的实验表明,我们的方法在视频段级别的F1分数和覆盖范围方面取得了最佳性能。
The rapid increase in the amount of published visual data and the limited time of users bring the demand for processing untrimmed videos to produce shorter versions that convey the same information. Despite the remarkable progress that has been made by summarization methods, most of them can only select a few frames or skims, which creates visual gaps and breaks the video context. In this paper, we present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos. Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video. Our agent is textually and visually oriented to select which frames to remove to shrink the input video. Additionally, we propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in terms of F1 Score and coverage at the video segment level.