当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis
IEEE Signal Processing Letters ( IF 3.2 ) Pub Date : 9-2-2022 , DOI: 10.1109/lsp.2022.3203888
Yi Lei 1 , Shan Yang 2 , Xinfa Zhu 1 , Lei Xie 1 , Dan Su 2
Affiliation  

Through borrowing emotional expressions from an emotional speaker, cross-speaker emotion transfer is an effective way to produce emotional speech for target speakers without emotional training data. Since emotion and timbre of the source speaker are heavily entangled in speech, existing approaches often struggle to trade off between speaker similarity and emotional expression in the synthetic speech of the target speaker. In this letter, we propose to disentangle timbre and emotion through information perturbation to conduct cross-speaker emotion transfer, which effectively learns the emotional expression of the source speaker and maintains the timbre of the target speaker. Specifically, we separately perturb the timbre and emotion-related features (e.g., formant and pitch) of source speech to obtain and model the timbre- and emotion-independent signals, based on which the proposed model can deliver the emotional expression for target speakers. Experimental results demonstrate the proposed approach significantly outperforms the baselines in terms of naturalness and similarity, indicating the effectiveness of information perturbation for cross-speaker emotion transfer.

中文翻译:


情感语音合成中通过信息扰动实现跨说话者情感传递



通过借用情感说话人的情感表达,跨说话人情感迁移是在没有情感训练数据的情况下为目标说话人产生情感语音的有效方法。由于源说话者的情感和音色在语音中严重纠缠在一起,现有的方法常常难以在目标说话者的合成语音中的说话者相似性和情感表达之间进行权衡。在这封信中,我们提出通过信息扰动来解开音色和情感,进行跨说话人的情感传递,从而有效地学习源说话人的情感表达并保持目标说话人的音色。具体来说,我们分别扰动源语音的音色和情感相关特征(例如,共振峰和音高),以获得和建模与音色和情感无关的信号,基于该信号,所提出的模型可以为目标说话者传递情感表达。实验结果表明,所提出的方法在自然性和相似性方面显着优于基线,表明信息扰动对于跨说话者情感传递的有效性。
更新日期:2024-08-28
down
wechat
bug