论文标题
MHM:多模式分层多媒体摘要
MHMS: Multimodal Hierarchical Multimedia Summarization
论文作者
论文摘要
具有多模式输出的多媒体摘要可以在现实世界应用程序中发挥重要作用,即自动生成新闻文章的封面图像和标题或为在线视频提供介绍。在这项工作中,我们通过交互视觉和语言域来生成视频和文本摘要,提出了多模式分层多媒体摘要(MHMS)框架。我们的MHMS方法分别包含视频和文本细分和摘要模块。它制定了一个具有最佳传输距离的跨域对准目标,该目标利用跨域相互作用来生成代表性的密钥帧和文本摘要。我们在三个最近的多模式数据集上评估了MHM,并证明了我们方法在生产高质量多模式摘要方面的有效性。
Multimedia summarization with multimodal output can play an essential role in real-world applications, i.e., automatically generating cover images and titles for news articles or providing introductions to online videos. In this work, we propose a multimodal hierarchical multimedia summarization (MHMS) framework by interacting visual and language domains to generate both video and textual summaries. Our MHMS method contains video and textual segmentation and summarization module, respectively. It formulates a cross-domain alignment objective with optimal transport distance which leverages cross-domain interaction to generate the representative keyframe and textual summary. We evaluated MHMS on three recent multimodal datasets and demonstrated the effectiveness of our method in producing high-quality multimodal summaries.