PIM：使用感知重要性图的视频编码

论文标题

PIM：使用感知重要性图的视频编码

PIM: Video Coding using Perceptual Importance Maps

论文作者

Pergament, Evgenya, Tandon, Pulkit, Rippel, Oren, Bourdev, Lubomir, Anderson, Alexander G., Olshausen, Bruno, Weissman, Tsachy, Katti, Sachin, Tatwawadi, Kedar

论文摘要

人类的感知是有损视频压缩的核心，在过去的二十年中，开发了许多用于感知质量评估和改进的方法。在确定感知质量的情况下，视频的不同时空区域的相对重要性与人类观众的相对重要性不同。但是，由于推断甚至收集此类细粒度信息是一项挑战，因此在压缩过程中通常不使用低级启发式方法。我们提出了一个框架，该框架促进了对压缩视频的细粒度主观重要性的研究，然后我们利用它来提高现有视频编解码器的速率延伸性能（x264）。这项工作的贡献是三倍：（1）我们引入了一个网络工具，该网络工具可以通过使用户在编码的视频上交互绘制时空绘制时空地图，从而允许可扩展的细粒度感知重要性集合；（2）我们使用此工具来收集一个带有178个视频的数据集，这些视频总共有14443帧的人类注释时空的重要性图与视频相比；（3）我们使用策划的数据集训练可以预测这些时空重要性区域的轻型机器学习模型。我们通过一项主观研究来证明，该研究在数据集中编码视频，同时考虑到重要性地图在同一比特率下导致更高的感知质量，而这些视频用重要性地图比基线视频更重要的是$ 1.8 \ times $。同样，我们表明，对于测试集中的18个视频，我们的模型预测的重要性图会导致更高的感知质量视频，$ 2 \ times $比基线优先在同一比特率上。

Human perception is at the core of lossy video compression, with numerous approaches developed for perceptual quality assessment and improvement over the past two decades. In the determination of perceptual quality, different spatio-temporal regions of the video differ in their relative importance to the human viewer. However, since it is challenging to infer or even collect such fine-grained information, it is often not used during compression beyond low-level heuristics. We present a framework which facilitates research into fine-grained subjective importance in compressed videos, which we then utilize to improve the rate-distortion performance of an existing video codec (x264). The contributions of this work are threefold: (1) we introduce a web-tool which allows scalable collection of fine-grained perceptual importance, by having users interactively paint spatio-temporal maps over encoded videos; (2) we use this tool to collect a dataset with 178 videos with a total of 14443 frames of human annotated spatio-temporal importance maps over the videos; and (3) we use our curated dataset to train a lightweight machine learning model which can predict these spatio-temporal importance regions. We demonstrate via a subjective study that encoding the videos in our dataset while taking into account the importance maps leads to higher perceptual quality at the same bitrate, with the videos encoded with importance maps preferred $1.8 \times$ over the baseline videos. Similarly, we show that for the 18 videos in test set, the importance maps predicted by our model lead to higher perceptual quality videos, $2 \times$ preferred over the baseline at the same bitrate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题