论文标题
重新学习的功能与数据增强有关视频相关性预测
Feature Re-Learning with Data Augmentation for Video Relevance Prediction
论文作者
论文摘要
预测两个给定视频与它们的视觉内容之间的相关性是基于内容的视频建议和检索的关键组成部分。由于预先训练的图像和视频卷积神经网络模型的可用性不断增加,因此深层视觉特征被广泛用于视频内容表示。但是,由于两个视频的相关性与任务有关,因此,此类现成功能并不总是最适合所有任务的最佳选择。此外,由于包括版权,隐私和安全性在内的各种问题,人们可能只能访问预先计算的视频功能而不是原始视频。我们在本文中提出了重新学习,以改善视频相关性预测,而无需重新审视原始视频内容。特别是,通过通过仿射转换将给定的深度功能投射到新空间中,可以实现重新学习。我们通过新颖的负增强三重态排名损失来优化重新学习过程。为了生成更多的培训数据,我们提出了一种新的数据增强策略,该策略直接适用于框架级别和视频级功能。在基于HULU内容的视频相关性预测挑战2018的背景下进行的广泛实验证明了该方法的有效性及其对基于内容的视频相关性预测的最新性能。
Predicting the relevance between two given videos with respect to their visual content is a key component for content-based video recommendation and retrieval. Thanks to the increasing availability of pre-trained image and video convolutional neural network models, deep visual features are widely used for video content representation. However, as how two videos are relevant is task-dependent, such off-the-shelf features are not always optimal for all tasks. Moreover, due to varied concerns including copyright, privacy and security, one might have access to only pre-computed video features rather than original videos. We propose in this paper feature re-learning for improving video relevance prediction, with no need of revisiting the original video content. In particular, re-learning is realized by projecting a given deep feature into a new space by an affine transformation. We optimize the re-learning process by a novel negative-enhanced triplet ranking loss. In order to generate more training data, we propose a new data augmentation strategy which works directly on frame-level and video-level features. Extensive experiments in the context of the Hulu Content-based Video Relevance Prediction Challenge 2018 justify the effectiveness of the proposed method and its state-of-the-art performance for content-based video relevance prediction.