电视视频质量评估

论文标题

电视视频质量评估

Telepresence Video Quality Assessment

论文作者

Ying, Zhenqiang, Ghadiyaram, Deepti, Bovik, Alan

论文摘要

包括视频和音频内容在内的视频会议已导致互联网流量的急剧增加，因为COVID-19大流行迫使数百万人在家中工作和学习。由于需要，有效且准确的视频质量工具，可以监视和感知通过Zoom，Webex，Meet等来监视并感知优化触发流量的全球互联网流量大大增加了，因此，现有模型在多模式，实时流式传输端口内容上的预测能力受到限制。在这里，我们通过多种方式解决了远程敏感视频质量评估（TVQA）的重大挑战。首先，我们通过收集来自不同国家 /地区的〜2k触觉视频来减轻主观标记的数据的缺乏，我们挤在了〜80k的主观质量标签上。使用此新资源，我们使用带有单独途径的多模式学习框架创建了一个在线视频质量预测框架，用于实时流媒体，以计算视觉和音频质量预测。我们的多合一模型能够在补丁，框架，剪辑和视听水平上提供准确的质量预测。我们的模型在现有质量数据库和新的TVQA数据库上都以大大降低的计算费用来实现最先进的性能，这使其成为移动和嵌入式系统的有吸引力的解决方案。

Video conferencing, which includes both video and audio content, has contributed to dramatic increases in Internet traffic, as the COVID-19 pandemic forced millions of people to work and learn from home. Global Internet traffic of video conferencing has dramatically increased Because of this, efficient and accurate video quality tools are needed to monitor and perceptually optimize telepresence traffic streamed via Zoom, Webex, Meet, etc. However, existing models are limited in their prediction capabilities on multi-modal, live streaming telepresence content. Here we address the significant challenges of Telepresence Video Quality Assessment (TVQA) in several ways. First, we mitigated the dearth of subjectively labeled data by collecting ~2k telepresence videos from different countries, on which we crowdsourced ~80k subjective quality labels. Using this new resource, we created a first-of-a-kind online video quality prediction framework for live streaming, using a multi-modal learning framework with separate pathways to compute visual and audio quality predictions. Our all-in-one model is able to provide accurate quality predictions at the patch, frame, clip, and audiovisual levels. Our model achieves state-of-the-art performance on both existing quality databases and our new TVQA database, at a considerably lower computational expense, making it an attractive solution for mobile and embedded systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题