当前位置: X-MOL 学术J. Acoust. Soc. Am. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Predicting ultrasound tongue image from lip images using sequence to sequence learning.
The Journal of the Acoustical Society of America ( IF 2.4 ) Pub Date : 2020-06-01 , DOI: 10.1121/10.0001328
Kele Xu 1 , Jianqiao Zhao 2 , Boqing Zhu 1 , Chaojie Zhao 3
Affiliation  

Understanding the dynamic system that produces speech is essential to advancing speech science, and several simultaneous sensory streams can be leveraged to describe the process. As the tongue functional deformation correlates with the lip's shapes of the speaker, this paper aims to explore the association between them. The problem is formulated as a sequence to sequence learning task and a deep neural network is trained using unlabeled lip videos to predict an upcoming ultrasound tongue image sequence. Experimental results show that the machine learning model can predict the tongue's motion with satisfactory performance, which demonstrates that the learned neural network can build the association between two imaging modalities.

中文翻译:

使用序列到序列学习从嘴唇图像预测超声舌图像。

了解产生语音的动态系统对于推进语音科学至关重要,可以同时利用多个感官流来描述该过程。由于舌头的功能变形与说话人的嘴唇形状有关,因此本文旨在探讨它们之间的关系。该问题被公式化为对学习任务进行排序的序列,并使用未标记的嘴唇视频训练深度神经网络来预测即将到来的超声舌图像序列。实验结果表明,机器学习模型能够以令人满意的性能预测舌头的运动,这表明学习的神经网络可以建立两种成像方式之间的关联。
更新日期:2020-06-01
down
wechat
bug