当前位置:
X-MOL 学术
›
arXiv.cs.SD
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms
arXiv - CS - Sound Pub Date : 2020-09-24 , DOI: arxiv-2009.11644 Lara Orlandic, Tomas Teijeiro, David Atienza
arXiv - CS - Sound Pub Date : 2020-09-24 , DOI: arxiv-2009.11644 Lara Orlandic, Tomas Teijeiro, David Atienza
Cough audio signal classification has been successfully used to diagnose a
variety of respiratory conditions, and there has been significant interest in
leveraging Machine Learning (ML) to provide widespread COVID-19 screening.
However, there is currently no validated database of cough sounds with which to
train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced
cough recordings representing a wide range of subject ages, genders, geographic
locations, and COVID-19 statuses. First, we filtered the dataset using our
open-sourced cough detection algorithm. Second, experienced pulmonologists
labeled more than 2,000 recordings to diagnose medical abnormalities present in
the coughs, thereby contributing one of the largest expert-labeled cough
datasets in existence that can be used for a plethora of cough audio
classification tasks. Finally, we ensured that coughs labeled as symptomatic
and COVID-19 originate from countries with high infection rates, and that their
expert labels are consistent. As a result, the COUGHVID dataset contributes a
wealth of cough recordings for training ML models to address the world's most
urgent health crises.
中文翻译:
COUGHVID 众包数据集:用于研究大规模咳嗽分析算法的语料库
咳嗽音频信号分类已成功用于诊断各种呼吸系统疾病,并且利用机器学习 (ML) 提供广泛的 COVID-19 筛查引起了极大的兴趣。但是,目前还没有经过验证的咳嗽声音数据库来训练此类 ML 模型。COUGHVID 数据集提供了 20,000 多个众包咳嗽记录,代表了广泛的受试者年龄、性别、地理位置和 COVID-19 状态。首先,我们使用开源的咳嗽检测算法过滤数据集。其次,经验丰富的肺病学家标记了 2,000 多个记录以诊断咳嗽中存在的医学异常,从而提供了现有最大的专家标记咳嗽数据集之一,可用于大量咳嗽音频分类任务。最后,我们确保标记为有症状和 COVID-19 的咳嗽来自感染率高的国家,并且他们的专家标签一致。因此,COUGHVID 数据集为训练 ML 模型提供了丰富的咳嗽记录,以解决世界上最紧迫的健康危机。
更新日期:2020-09-25
中文翻译:
COUGHVID 众包数据集:用于研究大规模咳嗽分析算法的语料库
咳嗽音频信号分类已成功用于诊断各种呼吸系统疾病,并且利用机器学习 (ML) 提供广泛的 COVID-19 筛查引起了极大的兴趣。但是,目前还没有经过验证的咳嗽声音数据库来训练此类 ML 模型。COUGHVID 数据集提供了 20,000 多个众包咳嗽记录,代表了广泛的受试者年龄、性别、地理位置和 COVID-19 状态。首先,我们使用开源的咳嗽检测算法过滤数据集。其次,经验丰富的肺病学家标记了 2,000 多个记录以诊断咳嗽中存在的医学异常,从而提供了现有最大的专家标记咳嗽数据集之一,可用于大量咳嗽音频分类任务。最后,我们确保标记为有症状和 COVID-19 的咳嗽来自感染率高的国家,并且他们的专家标签一致。因此,COUGHVID 数据集为训练 ML 模型提供了丰富的咳嗽记录,以解决世界上最紧迫的健康危机。