当前位置: X-MOL 学术IEEE J. Sel. Top. Signal Process. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modeling Obstructive Sleep Apnea voices using Deep Neural Network Embeddings and Domain-Adversarial Training
IEEE Journal of Selected Topics in Signal Processing ( IF 7.5 ) Pub Date : 2020-02-01 , DOI: 10.1109/jstsp.2019.2957977
Juan M. Perero-Codosero , Fernando Espinoza-Cuadros , Javier Anton-Martin , Miguel A. Barbero-Alvarez , Luis A. Hernandez-Gomez

Obstructive Sleep Apnea (OSA) is a sleep breathing disorder affecting at least 3–7% of male adults and 2–5% of female adults between 30 and 70 years. It causes recurrent partial or total obstruction episodes at the level of the pharynx which causes cessation of breath during sleep. The number of obstruction episodes per sleep hour, known as Apnea-Hypopnea Index (AHI), along with the degree of the daytime sleepiness, determine the severity of OSA. Usually, OSA is diagnosed at a Sleep Unit in a hospital by the time-consuming polysomnography (PSG) test. Based on the expected impact of anatomical and physiological effects of the altered structure of the upper airway in OSA patients’ voices, the assessment of OSA from speech has been proposed as a simple way to help in the diagnostic process. In this paper, we review previous research to assess OSA from speech and underline the difficulty of a weak connection between OSA and speech. We present results to model OSA using, to the best of our knowledge, for the first time Deep Learning on the largest existing database of OSA voice recordings and speakers’ clinical variables. Using state-of-the-art speaker recognition techniques: acoustic subspace modeling (i-vectors), and deep neural network embeddings (x-vectors), we confirm the weak connection between speech and OSA. We hypothesize that this weak effect is mediated by undesired sources of variability as speakers’ age, body mass index (BMI), or height, and we propose Domain-Adversarial Training (DAT) to remove them. Our results show that, taking BMI as adversarial domain, when classifying voices from OSA extreme cases (AHI $\leq$ 10 vs. AHI $\geq$ 30) accuracy increases from 69.39% to 76.60%. We hope these results can encourage the use of adversarial-domain neural networks to remove the undesired effects of clinical variables or other speaker factors when assessing health disorders from speech.

中文翻译:

使用深度神经网络嵌入和领域对抗训练对阻塞性睡眠呼吸暂停声音进行建模

阻塞性睡眠呼吸暂停 (OSA) 是一种睡眠呼吸障碍,影响 30 至 70 岁之间的至少 3-7% 的男性成年人和 2-5% 的女性成年人。它会导致咽部反复出现部分或完全阻塞,从而导致睡眠期间呼吸停止。每睡眠小时阻塞事件的次数,称为呼吸暂停低通气指数 (AHI),以及白天嗜睡的程度,决定了 OSA 的严重程度。通常,OSA 是在医院的睡眠科通过耗时的多导睡眠图 (PSG) 测试来诊断的。基于 OSA 患者声音中上呼吸道结构改变的解剖学和生理学效应的预期影响,已提出从语音评估 OSA 作为帮助诊断过程的简单方法。在本文中,我们回顾了以前的研究,以从语音评估 OSA,并强调 OSA 和语音之间存在弱联系的困难。据我们所知,我们首次使用深度学习对现有最大的 OSA 语音记录和说话者的临床变量数据库进行建模,从而展示结果。使用最先进的说话人识别技术:声学子空间建模(i 向量)和深度神经网络嵌入(x 向量),我们确认了语音和 OSA 之间的弱联系。我们假设这种微弱的影响是由说话者的年龄、体重指数 (BMI) 或身高等不受欢迎的可变性来源介导的,我们建议使用领域对抗训练 (DAT) 来消除它们。我们的结果表明,将 BMI 作为对抗域,在对来自 OSA 极端情况下的声音进行分类时(AHI $\leq$ 10 vs. AHI $\geq$ 30) 准确率从 69.39% 增加到 76.60%。我们希望这些结果可以鼓励使用对抗域神经网络来消除临床变量或其他说话人因素在从语音中评估健康障碍时的不良影响。
更新日期:2020-02-01
down
wechat
bug