带有正规驱动器和增强渲染器的感知对话校长产生

论文标题

带有正规驱动器和增强渲染器的感知对话校长产生

Perceptual Conversational Head Generation with Regularized Driver and Enhanced Renderer

论文作者

Huang, Ailin, Huang, Zhewei, Zhou, Shuchang

论文摘要

本文报告了我们针对ACM Multimedia VICO 2022对话式头部生成挑战的解决方案，该挑战旨在根据音频和参考图像生成生动的面对面对话视频。我们的解决方案专注于使用正规化和组装高质量渲染器培训通用的音频驱动器。我们仔细调整了行为的音频模型，并使用我们的前后背景融合模块进行后制作视频。我们在官方排行榜上的Talking Head Generation Track中获得了校长的第一名。我们的代码可在https://github.com/megvii-research/mm2022-vicoperceptualheadgeneration上找到。

This paper reports our solution for ACM Multimedia ViCo 2022 Conversational Head Generation Challenge, which aims to generate vivid face-to-face conversation videos based on audio and reference images. Our solution focuses on training a generalized audio-to-head driver using regularization and assembling a high-visual quality renderer. We carefully tweak the audio-to-behavior model and post-process the generated video using our foreground-background fusion module. We get first place in the listening head generation track and second place in the talking head generation track on the official leaderboard. Our code is available at https://github.com/megvii-research/MM2022-ViCoPerceptualHeadGeneration.

下载PDF全文

下载文献需遵守相关版权规定

论文标题