当前位置: X-MOL 学术arXiv.cs.MM › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Survey on Neural Speech Synthesis
arXiv - CS - Multimedia Pub Date : 2021-06-29 , DOI: arxiv-2106.15561
Xu Tan, Tao Qin, Frank Soong, Tie-Yan Liu

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions. This survey can serve both academic researchers and industry practitioners working on TTS.

中文翻译:

神经语音合成综述

文本到语音(TTS)或语音合成,旨在合成可理解和自然的语音给定文本,是语音、语言和机器学习社区的热门研究课题,在行业中具有广泛的应用。近年来,随着深度学习和人工智能的发展,基于神经网络的 TTS 显着提高了合成语音的质量。在本文中,我们对神经 TTS 进行了全面调查,旨在更好地了解当前的研究和未来的趋势。我们专注于神经 TTS 中的关键组件,包括文本分析、声学模型和声码器,以及几个高级主题,包括快速 TTS、低资源 TTS、鲁棒 TTS、富有表现力的 TTS 和自适应 TTS 等。我们进一步总结了相关的资源到 TTS(例如,数据集、开源实现)并讨论未来的研究方向。该调查可为从事 TTS 工作的学术研究人员和行业从业者提供服务。
更新日期:2021-06-30
down
wechat
bug