当前位置: X-MOL 学术IEEE Open J. Eng. Med. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Continuous Speech for Improved Learning Pathological Voice Disorders
IEEE Open Journal of Engineering in Medicine and Biology ( IF 2.7 ) Pub Date : 2022-02-14 , DOI: 10.1109/ojemb.2022.3151233
Syu-Siang Wang, Chi-Te Wang, Chih-Chung Lai, Yu Tsao, Shih-Hau Fang

Goal: Numerous studies had successfully differentiated normal and abnormal voice samples. Nevertheless, further classification had rarely been attempted. This study proposes a novel approach, using continuous Mandarin speech instead of a single vowel, to classify four common voice disorders (i.e. functional dysphonia, neoplasm, phonotrauma, and vocal palsy). Methods: In the proposed framework, acoustic signals are transformed into mel-frequency cepstral coefficients, and a bi-directional long-short term memory network (BiLSTM) is adopted to model the sequential features. The experiments were conducted on a large-scale database, wherein 1,045 continuous speech were collected by the speech clinic of a hospital from 2012 to 2019. Results: Experimental results demonstrated that the proposed framework yields significant accuracy and unweighted average recall improvements of 78.12–89.27% and 50.92–80.68%, respectively, compared with systems that use a single vowel. Conclusions: The results are consistent with other machine learning algorithms, including gated recurrent units, random forest, deep neural networks, and LSTM.The sensitivities for each disorder were also analyzed, and the model capabilities were visualized via principal component analysis. An alternative experiment based on a balanced dataset again confirms the advantages of using continuous speech for learning voice disorders.

中文翻译:


连续言语可改善学习病理性嗓音障碍



目标:大量研究已成功区分正常和异常的声音样本。然而,很少尝试进一步分类。这项研究提出了一种新颖的方法,使用连续的普通话语音而不是单个元音,对四种常见的声音障碍(即功能性发声困难、肿瘤、声音创伤和声带麻痹)进行分类。方法:在所提出的框架中,声学信号被转换为梅尔频率倒谱系数,并采用双向长短期记忆网络(BiLSTM)对顺序特征进行建模。实验在大型数据库上进行,其中医院言语诊所从 2012 年至 2019 年收集了 1,045 条连续语音。结果:实验结果表明,所提出的框架具有显着的准确性和未加权平均召回率提高了 78.12-89.27与使用单个元音的系统相比,分别为 50.92-80.68% 和 50.92-80.68%。结论:结果与其他机器学习算法一致,包括门控循环单元、随机森林、深度神经网络和 LSTM。还分析了每种疾病的敏感性,并通过主成分分析可视化模型功能。基于平衡数据集的替代实验再次证实了使用连续语音来学习语音障碍的优势。
更新日期:2022-02-14
down
wechat
bug