阿瓦塔尔（Avatar）：自动演示文稿生成框架利用说话的头像

论文标题

阿瓦塔尔（Avatar）：自动演示文稿生成框架利用说话的头像

Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar

论文作者

Sun, Aolan, Zhang, Xulong, Ling, Tiandong, Wang, Jianzong, Cheng, Ning, Xiao, Jing

论文摘要

自Covid-19大流行开始以来，远程会议和学校教学已成为重要的工具。先前的应用程序旨在通过实时互动来节省通勤成本。但是，在准备通信材料时，我们的应用将降低生产和繁殖成本。本文提出了一个名为Avatar的系统，生成了一个带有1个前面照片和3分钟语音录音的目标扬声器的演示视频。从技术上讲，系统由三个主要模块，用户体验接口（UEI）组成，说话面部模块和少量的文本到语音（TTS）模块。系统首先将目标扬声器的声音夹住，然后生成语音，最后生成一个具有适当嘴唇和头部动作的化身。在任何情况下，用户只需要用不同的音符替换幻灯片即可生成另一个新视频。该演示已在此处发布，并将作为免费软件发布。

Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools. The previous applications aim to save the commuting cost with real-time interactions. However, our application is going to lower the production and reproduction costs when preparing the communication materials. This paper proposes a system called Pre-Avatar, generating a presentation video with a talking face of a target speaker with 1 front-face photo and a 3-minute voice recording. Technically, the system consists of three main modules, user experience interface (UEI), talking face module and few-shot text-to-speech (TTS) module. The system firstly clones the target speaker's voice, and then generates the speech, and finally generate an avatar with appropriate lip and head movements. Under any scenario, users only need to replace slides with different notes to generate another new video. The demo has been released here and will be published as free software for use.

下载PDF全文

下载文献需遵守相关版权规定

论文标题