HypernasalityNet: Deep recurrent neural network for automatic hypernasality detection.,International Journal of Medical Informatics

当前位置： X-MOL 学术 › Int. J. Med. Inform. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

HypernasalityNet: Deep recurrent neural network for automatic hypernasality detection.
International Journal of Medical Informatics ( IF 4.9 ) Pub Date : 2019-08-25 , DOI: 10.1016/j.ijmedinf.2019.05.023
Xiyue Wang ₁ , Sen Yang ₁ , Ming Tang ₁ , Heng Yin ₂ , Hua Huang ₁ , Ling He ₁

Affiliation

BACKGROUND Cleft palate patients have inability to produce adequate velopharyngeal closure, which results in hypernasal speech. In clinic, hypernasal speech is assessed through subject assessment by speech language pathologists. Automatic hypernasal speech detection can provide aided diagnoses for speech language pathologists and clinicians. OBJECTIVES This study aims to develop Long Short-Term Memory (LSTM) based Deep Recurrent Neural Network (DRNN) system to detect hypernasal speech from cleft palate patients, thus to provide aided diagnoses for clinical operation and speech therapy. Meanwhile, the feature mining and classification abilities of LSTM-DRNN system are explored. METHODS The utilized speech recordings are 14,544 vowels in Mandarin. Speech data is collected from 144 children (72 children with hypernasality and 72 controls) with the age of 5-12 years old. This work proposes a LSTM based DRNN system to achieve automatic hypernasal speech detection, since LSTM-DRNN can learn short-time dependences of hypernasal speech. The vocal tract based features are fed into LSTM-DRNN to achieve deep mining of features. To verify the feature mining ability of LSTM-DRNN, features projected by LSTM-DRNN are fed into shallow classifiers instead of the following two fully connected layers and a softmax layer. And the features without the projecting process of LSTM-DRNN are directly fed into shallow classifiers as a comparison. Hypernasality-sensitive vowels (/a/, /i/, and /u/) are analyzed for the first time. RESULTS This LSTM-DRNN based hypernasal speech detection method reaches higher detection accuracy than that using shallow classifiers, since LSTM-DRNN mines features through time axis and network depth simultaneously. The proposed LSTM-DRNN based hypernasality detection system reaches the highest accuracy of 93.35%. According to the analysis of hypernasality-sensitive vowels, the experimental result concludes that vowels /i/ and /u/ are the most sensitive vowels to hypernasal speech. CONCLUSIONS The results show that LSTM-DRNN has robust feature mining ability and classification ability. This is the first work that applies the LSTM-DRNN technique to automatically detect hypernasality in cleft palate speech. The experimental results demonstrate the potential of deep learning on pathologist speech detection.

中文翻译：

HypernasalityNet：深度递归神经网络，用于自动鼻腔检测。

背景C裂患者无法产生足够的咽喉闭合，从而导致鼻音过高。在临床中，言语病理学家通过主题评估来评估鼻音。自动鼻音检测可以为语言病理学家和临床医生提供辅助诊断。目的本研究旨在开发基于长期短期记忆（LSTM）的深层递归神经网络（DRNN）系统，以检测c裂患者的鼻音升高，从而为临床手术和语音治疗提供辅助诊断。同时，探讨了LSTM-DRNN系统的特征挖掘和分类能力。方法使用的语音记录是普通话的14544个元音。语音数据是从144名年龄在5至12岁的儿童（72名患有鼻炎的儿童和72名对照）中收集的。这项工作提出了一种基于LSTM的DRNN系统，以实现自动鼻音检测，因为LSTM-DRNN可以了解鼻音的短时依赖性。基于声道的特征被馈入LSTM-DRNN以实现特征的深度挖掘。为了验证LSTM-DRNN的特征挖掘能力，将LSTM-DRNN投影的特征输入到浅分类器中，而不是随后的两个完全连接的层和softmax层。无需进行LSTM-DRNN投影处理的特征就可以直接作为浅层分类器进行比较。首次分析鼻敏感型元音（/ a /，/ i /和/ u /）。结果由于LSTM-DRNN同时挖掘时间轴和网络深度的特征，因此这种基于LSTM-DRNN的鼻语音检测方法比使用浅分类器的检测精度更高。所提出的基于LSTM-DRNN的鼻病检测系统可达到93.35％的最高准确度。根据对鼻音敏感的元音的分析，实验结果表明，元音/ i /和/ u /是对鼻音最敏感的元音。结论结果表明，LSTM-DRNN具有强大的特征挖掘能力和分类能力。这是应用LSTM-DRNN技术自动检测c裂语音中的鼻音过高的第一项工作。实验结果证明了深度学习在病理学家语音检测方面的潜力。

更新日期：2019-11-01

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>