The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms,arXiv - CS - Sound

当前位置： X-MOL 学术 › arXiv.cs.SD › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms
arXiv - CS - Sound Pub Date : 2020-09-24 , DOI: arxiv-2009.11644
Lara Orlandic, Tomas Teijeiro, David Atienza

Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filtered the dataset using our open-sourced cough detection algorithm. Second, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled as symptomatic and COVID-19 originate from countries with high infection rates, and that their expert labels are consistent. As a result, the COUGHVID dataset contributes a wealth of cough recordings for training ML models to address the world's most urgent health crises.

中文翻译：

COUGHVID 众包数据集：用于研究大规模咳嗽分析算法的语料库

咳嗽音频信号分类已成功用于诊断各种呼吸系统疾病，并且利用机器学习 (ML) 提供广泛的 COVID-19 筛查引起了极大的兴趣。但是，目前还没有经过验证的咳嗽声音数据库来训练此类 ML 模型。COUGHVID 数据集提供了 20,000 多个众包咳嗽记录，代表了广泛的受试者年龄、性别、地理位置和 COVID-19 状态。首先，我们使用开源的咳嗽检测算法过滤数据集。其次，经验丰富的肺病学家标记了 2,000 多个记录以诊断咳嗽中存在的医学异常，从而提供了现有最大的专家标记咳嗽数据集之一，可用于大量咳嗽音频分类任务。最后，我们确保标记为有症状和 COVID-19 的咳嗽来自感染率高的国家，并且他们的专家标签一致。因此，COUGHVID 数据集为训练 ML 模型提供了丰富的咳嗽记录，以解决世界上最紧迫的健康危机。

更新日期：2020-09-25

点击分享查看原文

点击收藏

阅读更多本刊最新论文