通过产生一致的成对伪级，半监督感知视频质量的学习

论文标题

通过产生一致的成对伪级，半监督感知视频质量的学习

Semi-supervised Learning of Perceptual Video Quality by Generating Consistent Pairwise Pseudo-Ranks

论文作者

Mitra, Shankhanil, Jogani, Saiyam, Soundararajan, Rajiv

论文摘要

由于需要大量的人类注释质量注释，因此设计基于学习的NO-参考（NR）视频质量评估（VQA）算法很麻烦。在这项工作中，我们提出了一个半监督的学习（SSL）框架，利用了许多未标记和非常有限的标记的真实扭曲的视频。我们的主要贡献是两个方面。我们的SSL模型利用一致性正规化和伪标签的好处，使用StrongWeak增强视频上的学生老师模型为未标记的视频生成了成对的伪级。我们设计了强大的增强功能，以便在SSL中有效地使用未标记的视频不变。生成的伪量与有限的标签一起使用来训练我们的SSL模型。我们在NR VQA的SSL中的主要重点是学习从视频特征表示形式到质量分数的映射。我们比较各种功能提取方法，并表明我们的SSL框架可以改善这些功能的性能。除了现有特征外，我们还基于预测空间和时间熵差的空间和时间特征提取方法。我们表明，在接受有限的数据培训时，这些功能有助于实现强大的性能，从而为应用SSL提供了更好的基准。在三个流行的VQA数据集上进行的广泛实验表明，即使我们的新型SSL方法和功能的结合在与人类感知的相关性方面具有令人印象深刻的表现，即使人类通知视频的数量可能受到限制。

Designing learning-based no-reference (NR) video quality assessment (VQA) algorithms for camera-captured videos is cumbersome due to the requirement of a large number of human annotations of quality. In this work, we propose a semi-supervised learning (SSL) framework exploiting many unlabelled and very limited amounts of labelled authentically distorted videos. Our main contributions are two-fold. Leveraging the benefits of consistency regularization and pseudo-labelling, our SSL model generates pairwise pseudo-ranks for the unlabelled videos using a student-teacher model on strongweak augmented videos. We design the strong-weak augmentations to be quality invariant to use the unlabelled videos effectively in SSL. The generated pseudo-ranks are used along with the limited labels to train our SSL model. Our primary focus in SSL for NR VQA is to learn the mapping from video feature representations to the quality scores. We compare various feature extraction methods and show that our SSL framework can lead to improved performance on these features. In addition to the existing features, we present a spatial and temporal feature extraction method based on predicting spatial and temporal entropic differences. We show that these features help achieve a robust performance when trained with limited data providing a better baseline to apply SSL. Extensive experiments on three popular VQA datasets demonstrate that a combination of our novel SSL approach and features achieves an impressive performance in terms of correlation with human perception, even though the number of human-annotated videos may be limited.

下载PDF全文

下载文献需遵守相关版权规定

论文标题