论文标题
视频背景音乐生成:数据集,方法和评估
Video Background Music Generation: Dataset, Method and Evaluation
论文作者
论文摘要
在编辑视频时,音乐至关重要,但是手动选择音乐是困难且耗时的。因此,我们试图自动生成给定视频输入的背景音乐曲目。这是一项具有挑战性的任务,因为它需要音乐视频数据集,视频到音乐生成的有效体系结构以及合理的指标,目前都不存在。为了缩小此差距,我们介绍了一个完整的食谱,包括数据集,基准模型和视频背景音乐生成的评估指标。我们介绍Symmv,这是一个带有各种音乐注释的视频和象征性音乐数据集。据我们所知,这是第一个带有丰富音乐注释的视频音乐数据集。我们还提出了一个名为V-Musprod的基准视频背景音乐生成框架,该框架利用和弦,旋律和伴奏的音乐先验以及语义,色彩和运动功能的视频音乐关系。为了解决视频音乐通信缺乏客观指标,我们设计了一个基于检索的公制VMCP,建立在功能强大的视频音乐表示模型上。实验表明,使用我们的数据集,V-Musprod在音乐质量和视频通信方面都优于最先进的方法。我们相信我们的数据集,基准模型和评估指标将促进视频背景音乐的发展。我们的数据集和代码可在https://github.com/zhuole1025/symmv上找到。
Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires music-video datasets, efficient architectures for video-to-music generation, and reasonable metrics, none of which currently exist. To close this gap, we introduce a complete recipe including dataset, benchmark model, and evaluation metric for video background music generation. We present SymMV, a video and symbolic music dataset with various musical annotations. To the best of our knowledge, it is the first video-music dataset with rich musical annotations. We also propose a benchmark video background music generation framework named V-MusProd, which utilizes music priors of chords, melody, and accompaniment along with video-music relations of semantic, color, and motion features. To address the lack of objective metrics for video-music correspondence, we design a retrieval-based metric VMCP built upon a powerful video-music representation learning model. Experiments show that with our dataset, V-MusProd outperforms the state-of-the-art method in both music quality and correspondence with videos. We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation. Our dataset and code are available at https://github.com/zhuole1025/SymMV.