当前位置: X-MOL 学术IEEE Signal Process. Lett. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis
IEEE Signal Processing Letters ( IF 3.9 ) Pub Date : 2022-09-02 , DOI: 10.1109/lsp.2022.3203888
Yi Lei 1 , Shan Yang 2 , Xinfa Zhu 1 , Lei Xie 1 , Dan Su 2
Affiliation  

Through borrowing emotional expressions from an emotional speaker, cross-speaker emotion transfer is an effective way to produce emotional speech for target speakers without emotional training data. Since emotion and timbre of the source speaker are heavily entangled in speech, existing approaches often struggle to trade off between speaker similarity and emotional expression in the synthetic speech of the target speaker. In this letter, we propose to disentangle timbre and emotion through information perturbation to conduct cross-speaker emotion transfer, which effectively learns the emotional expression of the source speaker and maintains the timbre of the target speaker. Specifically, we separately perturb the timbre and emotion-related features (e.g., formant and pitch) of source speech to obtain and model the timbre- and emotion-independent signals, based on which the proposed model can deliver the emotional expression for target speakers. Experimental results demonstrate the proposed approach significantly outperforms the baselines in terms of naturalness and similarity, indicating the effectiveness of information perturbation for cross-speaker emotion transfer.

中文翻译:

情绪语音合成中通过信息扰动的跨语者情绪传递

通过借用情感说话者的情感表达,跨说话者情感转移是在没有情感训练数据的情况下为目标说话者产生情感语音的有效方法。由于源说话者的情感和音色在语音中纠缠不清,现有的方法通常难以在目标说话者的合成语音中的说话者相似性和情感表达之间进行权衡。在这封信中,我们提出通过信息扰动解开音色和情感,进行跨说话人的情感转移,有效地学习源说话人的情感表达,保持目标说话人的音色。具体来说,我们分别扰乱音色和情感相关的特征(例如,共振峰和音高)来获取和建模与音色和情感无关的信号,在此基础上,所提出的模型可以为目标说话者提供情感表达。实验结果表明,所提出的方法在自然性和相似性方面明显优于基线,表明信息扰动对跨说话者情感转移的有效性。
更新日期:2022-09-02
down
wechat
bug