语义视频时刻大规模检索：一项新任务和基线

论文标题

语义视频时刻大规模检索：一项新任务和基线

Semantic Video Moments Retrieval at Scale: A New Task and a Baseline

论文作者

Li, Na

论文摘要

通过获得相关的视频剪辑而不是整个视频来节省搜索工作的需求越来越多，我们提出了一项新任务，名为“语义视频瞬间”（Sminantic Video Moments逐个检索（SVMR）），该任务旨在查找相关视频，并重新定位其中的视频剪辑。由于几个基本方面，我们的任务不是视频检索和视频重新定位的简单组合，而是更具挑战性的。在第一阶段，我们的SVMR应该考虑到以下事实：1）积极的候选人长视频可以包含许多无关紧要的剪辑，这些剪辑在语义上也有意义。 2）如果包含与两个查询相关的剪辑，则长视频对两个完全不同的查询剪辑可能是正面的。第二个重新定位阶段还表现出来自现有视频重新定位任务的不同假设，这些假设认为参考视频必须包含与查询剪辑相对应的语义上相似的段。取而代之的是，在我们的情况下，由于第一阶段的不准确，检索到的长视频可能是一个假积极的视频。为了应对这些挑战，我们提出了我们的两个阶段候选视频检索的基线解决方案，然后提出了一个新型的基于注意力的查询语义对齐框架，以重新定位候选视频的目标剪辑。此外，我们从现成的ActivityNet-1.3和HACS构建了两个更合适的基准数据集，以对SVMR模型进行全面评估。进行了广泛的实验，以表明我们的解决方案的表现优于几个参考解决方案。

Motivated by the increasing need of saving search effort by obtaining relevant video clips instead of whole videos, we propose a new task, named Semantic Video Moments Retrieval at scale (SVMR), which aims at finding relevant videos coupled with re-localizing the video clips in them. Instead of a simple combination of video retrieval and video re-localization, our task is more challenging because of several essential aspects. In the 1st stage, our SVMR should take into account the fact that: 1) a positive candidate long video can contain plenty of irrelevant clips which are also semantically meaningful. 2) a long video can be positive to two totally different query clips if it contains clips relevant to two queries. The 2nd re-localization stage also exhibits different assumptions from existing video re-localization tasks, which hold an assumption that the reference video must contain semantically similar segments corresponding to the query clip. Instead, in our scenario, the retrieved long video can be a false positive one due to the inaccuracy of the first stage. To address these challenges, we propose our two-stage baseline solution of candidate videos retrieval followed by a novel attention-based query-reference semantically alignment framework to re-localize target clips from candidate videos. Furthermore, we build two more appropriate benchmark datasets from the off-the-shelf ActivityNet-1.3 and HACS for a thorough evaluation of SVMR models. Extensive experiments are carried out to show that our solution outperforms several reference solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题