Rethinking glottal midline detection,Scientific Reports

当前位置： X-MOL 学术 › Sci. Rep. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Rethinking glottal midline detection
Scientific Reports ( IF 4.6 ) Pub Date : 2020-11-26 , DOI: 10.1038/s41598-020-77216-6
Andreas M. Kist , Julian Zilker , Pablo Gómez , Anne Schützenberger , Michael Döllinger

A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.

中文翻译：

重新思考声门中线检测

健康的声音对于口头交流至关重要，因此在日常生活以及职业生活中都至关重要。健康声音的基础是在喉部产生声带的声音。健康的声带振荡的标志是左右声带的对称运动。临床上，将视频内窥镜检查用于评估振荡的对称性并进行主观评估。高速视频内窥镜检查是一种新兴的方法，可以量化声带振荡，但由于数据量大和复杂的半自动分析，这种方法在研究中更为常用。在这项研究中，我们提供了对可以自动检测声门中线的方法的全面评估。我们使用生物物理模型来模拟不同的声带振荡，使用手动注释扩展了公开可用的BAGLS数据集，同时利用模拟和带注释的内窥镜图像，以在分析工作流程的不同阶段训练深层神经网络，并将其与已建立的计算机视觉算法进行比较。我们发现经典的计算机视觉在检测声门分割数据中的声门中线方面表现良好，但在此任务上胜过深度神经网络。我们还建议使用GlottisNet，这是一种多任务神经体系结构，具有同时预测声带和对称轴之间的开度的功能，从而通过全面，深入的学习辅助喉镜内窥镜检查向临床应用迈出了一大步自动分割和中线检测。在分析工作流程的不同阶段训练深度神经网络，并将其与已建立的计算机视觉算法进行比较。我们发现经典的计算机视觉在检测声门分割数据中的声门中线方面表现良好，但在此任务上胜过深度神经网络。我们还建议使用GlottisNet，这是一种多任务神经体系结构，具有同时预测声带和对称轴之间的开度的功能，从而通过全面，深入的学习辅助喉镜内窥镜检查向临床应用迈出了一大步自动分割和中线检测。在分析工作流程的不同阶段训练深度神经网络，并将其与已建立的计算机视觉算法进行比较。我们发现经典的计算机视觉在检测声门分割数据中的声门中线方面表现良好，但在此任务上胜过深度神经网络。我们还建议使用GlottisNet，这是一种多任务神经体系结构，具有同时预测声带和对称轴之间的开度的功能，从而通过全面，深入的学习辅助喉镜内窥镜检查向临床应用迈出了一大步自动分割和中线检测。我们发现经典的计算机视觉在检测声门分割数据中的声门中线方面表现良好，但在此任务上胜过深度神经网络。我们还建议使用GlottisNet，这是一种多任务神经体系结构，具有同时预测声带和对称轴之间的开度的功能，从而通过全面，深入的学习辅助喉镜内窥镜检查向临床应用迈出了一大步自动分割和中线检测。我们发现经典的计算机视觉在检测声门分割数据中的声门中线方面表现良好，但在此任务上胜过深度神经网络。我们还建议使用GlottisNet，这是一种多任务神经体系结构，具有同时预测声带和对称轴之间的开度的功能，从而通过全面，深入的学习辅助喉镜内窥镜检查向临床应用迈出了一大步自动分割和中线检测。

更新日期：2020-11-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>