流式传输任何一到许多声音转换的非自动入口模型

论文标题

流式传输任何一到许多声音转换的非自动入口模型

Streaming non-autoregressive model for any-to-many voice conversion

论文作者

Chen, Ziyi, Miao, Haoran, Zhang, Pengyuan

论文摘要

语音转换模型已经开发了数十年，而当前的主流研究则侧重于非流传输的语音转换。但是，流式语音转换更适合于实用的应用程序场景，而不是非流式语音转换。在本文中，我们提出了基于完全非Auteroregressive模型的流式传输，其中包括基于流媒体变压器的声学模型和流媒体机器人。基于流变压器的声学模型是由基于端到端的自动语音识别模型和在快速播放块上修改的解码器的预训练编码器组成的。流式Vocoder设计用于使用伪正交镜面滤光箱和因果卷积的流媒体任务。实验结果表明，该提出的方法在延迟和转化质量方面都能达到显着性能，并且可以在CPU和GPU上实时。

Voice conversion models have developed for decades, and current mainstream research focuses on non-streaming voice conversion. However, streaming voice conversion is more suitable for practical application scenarios than non-streaming voice conversion. In this paper, we propose a streaming any-to-many voice conversion based on fully non-autoregressive model, which includes a streaming transformer based acoustic model and a streaming vocoder. Streaming transformer based acoustic model is composed of a pre-trained encoder from streaming end-to-end based automatic speech recognition model and a decoder modified on FastSpeech blocks. Streaming vocoder is designed for streaming task with pseudo quadrature mirror filter bank and causal convolution. Experimental results show that the proposed method achieves significant performance both in latency and conversion quality and can be real-time on CPU and GPU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题