2BIVQA：基于双Bi-LSTM的视频视频评估

论文标题

2BIVQA：基于双Bi-LSTM的视频视频评估

2BiVQA: Double Bi-LSTM based Video Quality Assessment of UGC Videos

论文作者

Telili, Ahmed, Fezza, Sid Ahmed, Hamidouche, Wassim, Meftah, Hanene F. Z. Brachemi

论文摘要

最近，随着移动设备以及视频共享平台（例如YouTube，Facebook，Tiktok和Twitch）的流行，用户生成的内容（UGC）视频已经变得越来越普遍，现在说明了Internet上大量的多媒体流量。与电影制片人和摄影师制作的专业生成的视频不同，UGC视频包含多种真实的扭曲，通常在幼稚用户在捕获和处理过程中介绍。 UGC视频的质量预测对于优化和监视其在托管平台中的处理（例如编码，转码和流媒体）中的处理至关重要。但是，UGC的盲目质量预测非常具有挑战性，因为除了原始参考的不可用之外，UGC视频的降解是未知且非常多样化的。因此，在本文中，我们提出了针对UGC视频的准确有效的盲目视频质量评估（BVQA）模型，我们将其命名为2BIVQA，用于双BISTM视频质量评估。 2BIVQA度量由三个主要块组成，包括预训练的卷积神经网络（CNN），以从图像斑块中提取区分特征，然后将其馈入两个复发性神经网络（RNN），以进行空间和时间池。具体而言，我们使用两个双向长期内存（BI-LSTM）网络，第一个用于捕获图像贴片之间的短距离依赖性，而第二个则允许捕获帧之间的远程依赖关系来解释时间内存效应。最近大规模UGC VQA数据集的实验结果表明，与大多数最先进的VQA模型相比，2BIVQA以低计算成本实现高性能。我们的2BIVQA指标的源代码可公开可用：https：//github.com/atelili/2bivqa

Recently, with the growing popularity of mobile devices as well as video sharing platforms (e.g., YouTube, Facebook, TikTok, and Twitch), User-Generated Content (UGC) videos have become increasingly common and now account for a large portion of multimedia traffic on the internet. Unlike professionally generated videos produced by filmmakers and videographers, typically, UGC videos contain multiple authentic distortions, generally introduced during capture and processing by naive users. Quality prediction of UGC videos is of paramount importance to optimize and monitor their processing in hosting platforms, such as their coding, transcoding, and streaming. However, blind quality prediction of UGC is quite challenging because the degradations of UGC videos are unknown and very diverse, in addition to the unavailability of pristine reference. Therefore, in this paper, we propose an accurate and efficient Blind Video Quality Assessment (BVQA) model for UGC videos, which we name 2BiVQA for double Bi-LSTM Video Quality Assessment. 2BiVQA metric consists of three main blocks, including a pre-trained Convolutional Neural Network (CNN) to extract discriminative features from image patches, which are then fed into two Recurrent Neural Networks (RNNs) for spatial and temporal pooling. Specifically, we use two Bi-directional Long Short Term Memory (Bi-LSTM) networks, the first is used to capture short-range dependencies between image patches, while the second allows capturing longrange dependencies between frames to account for the temporal memory effect. Experimental results on recent large-scale UGC VQA datasets show that 2BiVQA achieves high performance at lower computational cost than most state-of-the-art VQA models. The source code of our 2BiVQA metric is made publicly available at: https://github.com/atelili/2BiVQA

下载PDF全文

下载文献需遵守相关版权规定

论文标题