当前位置: X-MOL 学术Comput. Speech Lang › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A deep neural network based correction scheme for improved air-tissue boundary prediction in real-time magnetic resonance imaging video
Computer Speech & Language ( IF 3.1 ) Pub Date : 2020-09-28 , DOI: 10.1016/j.csl.2020.101160
Renuka Mannem , Prasanta Kumar Ghosh

The real-time Magnetic Resonance Imaging (rtMRI) video captures the vocal tract movements in the mid-sagittal plane during speech. Air tissue boundaries (ATBs) are contours that trace the transition between the high-intensity tissue corresponding to the speech articulators and the low-intensity airway cavity in the rtMRI video. The ATB segmentation in an rtMRI video is a common preprocessing step which is used for many speech production and speech processing applications. However, ATB segmentation is very challenging due to the low resolution and low signal-to-noise ratio of the rtMRI images. Several works have been proposed in the literature for accurate ATB segmentation. However, every ATB segmentation technique, be it knowledge-based or data-driven, has its own limitations due to model assumption or data quality. The errors in the predicted ATBs from a typical ATB segmentation approach can be corrected in a data-driven manner as a post-processing step. In this work, we propose a deep neural network (DNN) based correction scheme for improving the ATB segmentation. In the DNN based correction approach, the correction of each point on a predicted ATB is done using a pattern of intensity variation in the direction of the normal to the predicted ATB at that point. For this, inputs and target outputs needed for DNN training are generated using a normal-grid based method. Experimental results show that the proposed DNN based correction yields more accurate ATBs in terms of Dynamic Time Warping (DTW) distance compared to the ATB segmentation approaches it is applied on. Thus, the DNN based correction could be used as a post-processing step to improve the accuracy of the predicted ATBs from any segmentation scheme.



中文翻译:

基于深度神经网络的校正方案,用于实时磁共振成像视频中改进的气组织边界预测

实时磁共振成像(rtMRI)视频捕获语音过程中矢状面中段的声道运动。空气组织边界(ATB)是轮廓,用于描绘r​​tMRI视频中与语音发音器相对应的高强度组织与低强度气道腔之间的过渡。rtMRI视频中的ATB分割是常见的预处理步骤,可用于许多语音生成和语音处理应用程序。但是,由于rtMRI图像的低分辨率和低信噪比,ATB分割非常具有挑战性。在文献中已经提出了一些用于准确的ATB分割的工作。然而,由于模型假设或数据质量,每种ATB分割技术,无论是基于知识的还是数据驱动的,都有其自身的局限性。来自典型ATB分割方法的预测ATB中​​的错误可以作为后处理步骤以数据驱动的方式进行校正。在这项工作中,我们提出了一种基于深度神经网络(DNN)的校正方案,用于改善ATB分割。在基于DNN的校正方法中,使用在该点处相对于预测ATB的法线方向上的强度变化模式对预测ATB上的每个点进行校正。为此,使用基于普通网格的方法生成DNN训练所需的输入和目标输出。实验结果表明,与基于DNN的ATB分割方法相比,基于DNN的校正在动态时间规整(DTW)距离方面产生了更准确的ATB。从而,

更新日期:2020-10-02
down
wechat
bug