论文标题
建立一个感知到沉浸式内容的视听质量模型
Towards a Perceived Audiovisual Quality Model for Immersive Content
论文作者
论文摘要
本文研究了使用头部安装显示器和多通道扬声器设置复制的多媒体内容质量,重点介绍了360个视频和Ambisonic空间音频。选择了360个视频的基本视频质量测试条件后的编码参数,并为音频编码器使用了低含量的编解码器。分别对音频,视频和视听进行了三个主观实验。计算了峰值信噪比(PSNR)及其对360个视频的变体以获取客观质量指标,并随后与主观视频分数相关。这项研究表明,跨格式SPSNR-NN在所有视频序列上具有较高的线性和单调相关性。基于视听模型,功率模型显示了测试数据和预测分数之间的最高相关性。我们得出的结论是,为了使高质量的预测模型的发展,需要一个高质量,关键,同步的视听数据库。此外,在测试之前,全面的评估师培训可能是有益的,以提高评估者的歧视能力,尤其是在多通道音频复制方面。为了进一步提高视听质量模型的沉浸式内容,除了开发更广泛和关键的视听数据库外,还需要进化主观的测试方法以提供更大的分辨率和鲁棒性。
This paper studies the quality of multimedia content focusing on 360 video and ambisonic spatial audio reproduced using a head-mounted display and a multichannel loudspeaker setup. Encoding parameters following basic video quality test conditions for 360 videos were selected and a low-bitrate codec was used for the audio encoder. Three subjective experiments were performed for the audio, video, and audiovisual respectively. Peak signal-to-noise ratio (PSNR) and its variants for 360 videos were computed to obtain objective quality metrics and subsequently correlated with the subjective video scores. This study shows that a Cross-Format SPSNR-NN has a slightly higher linear and monotonic correlation over all video sequences. Based on the audiovisual model, a power model shows a highest correlation between test data and predicted scores. We concluded that to enable the development of superior predictive model, a high quality, critical, synchronized audiovisual database is required. Furthermore, comprehensive assessor training may be beneficial prior to the testing to improve the assessors' discrimination ability particularly with respect to multichannel audio reproduction. In order to further improve the performance of audiovisual quality models for immersive content, in addition to developing broader and critical audiovisual databases, the subjective testing methodology needs to be evolved to provide greater resolution and robustness.