比较和选择：视频摘要与多代理增强学习

论文标题

比较和选择：视频摘要与多代理增强学习

Compare and Select: Video Summarization with Multi-Agent Reinforcement Learning

论文作者

Liu, Tianyu

论文摘要

视频摘要旨在从冗长的视频中生成简洁的视频摘要，以获得更好的用户观看体验。由于主观性，视频摘要的纯监督方法可能会带来注释中的固有错误。为了解决主观性问题，我们研究了一般用户汇总过程。一般用户通常观看整个视频，比较有趣的剪辑，然后选择一些剪辑以形成最终摘要。受一般用户行为的启发，我们将汇总过程作为多个顺序决策过程，并根据多代理增强学习提出比较选择网络（COSNET）。每个代理都专注于视频剪辑，并在迭代期间不断改变其焦点，所有代理的最终焦点片段构成了摘要。比较网络为代理提供了来自夹子的视觉功能和过去一轮的时间顺序分别，而代理的选择网络就其焦点剪辑的更改做出了决策。专门设计的无监督的奖励和监督奖励共同为政策发展做出了贡献，每个政策都包含本地和全球部分。在两个基准数据集上进行的广泛实验表明，COSNET以无监督的奖励优于最先进的无监督方法，并以完全的奖励超过了大多数监督方法。

Video summarization aims at generating concise video summaries from the lengthy videos, to achieve better user watching experience. Due to the subjectivity, purely supervised methods for video summarization may bring the inherent errors from the annotations. To solve the subjectivity problem, we study the general user summarization process. General users usually watch the whole video, compare interesting clips and select some clips to form a final summary. Inspired by the general user behaviours, we formulate the summarization process as multiple sequential decision-making processes, and propose Comparison-Selection Network (CoSNet) based on multi-agent reinforcement learning. Each agent focuses on a video clip and constantly changes its focus during the iterations, and the final focus clips of all agents form the summary. The comparison network provides the agent with the visual feature from clips and the chronological feature from the past round, while the selection network of the agent makes decisions on the change of its focus clip. The specially designed unsupervised reward and supervised reward together contribute to the policy advancement, each containing local and global parts. Extensive experiments on two benchmark datasets show that CoSNet outperforms state-of-the-art unsupervised methods with the unsupervised reward and surpasses most supervised methods with the complete reward.

下载PDF全文

下载文献需遵守相关版权规定

论文标题