当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Memory Attention: Robust Alignment using Gating Mechanism for End-to-End Speech Synthesis
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 2020-01-01 , DOI: 10.1109/lsp.2020.3036349
Joun Yeop Lee , Sung Jun Cheon , Byoung Jin Choi , Nam Soo Kim

Recent end-to-end (e2e) speech synthesis systems usually employ attention techniques to align an input text sequence against a mel-spectrogram sequence. Attention-based e2e approach has shown state-of-the-art performance in speech synthesis. However, generating stable and robust attention alignment to avoid some serious failures such as repeating, missing, and mumbling phones is still an ongoing challenge. In order to mitigate these alignment failures, we propose a novel attention method called memory attention for e2e speech synthesis, which is inspired by the gating mechanism of the long-short term memory (LSTM). Leveraging the sequence modeling power of the gating techniques, memory attention can produce a stable alignment by controlling the amount of content-based and location-based information. For performance evaluation, we compared our proposed memory attention algorithm with various conventional attention techniques in single speaker and emotional speech synthesis scenarios. From the experimental results, we conclude that memory attention can robustly generate various stylish speech.

中文翻译:

记忆注意:使用门控机制进行端到端语音合成的稳健对齐

最近的端到端 (e2e) 语音合成系统通常采用注意力技术将输入文本序列与梅尔谱图序列对齐。基于注意力的 e2e 方法在语音合成中表现出最先进的性能。然而,产生稳定和强大的注意力对齐以避免一些严重的失败,例如重复、丢失和喃喃自语的电话仍然是一个持续的挑战。为了减轻这些对齐失败,我们提出了一种新的注意力方法,称为 e2e 语音合成的记忆注意力,其灵感来自长短期记忆(LSTM)的门控机制。利用门控技术的序列建模能力,记忆注意力可以通过控制基于内容和基于位置的信息量来产生稳定的对齐。对于绩效评估,我们在单个说话者和情感语音合成场景中将我们提出的记忆注意力算法与各种传统注意力技术进行了比较。根据实验结果,我们得出结论,记忆注意力可以稳健地生成各种时尚的语音。
更新日期:2020-01-01
down
wechat
bug