当前位置:
X-MOL 学术
›
Geosci. Instrum. Method. Data Syst.
›
论文详情
Our official English website, www.x-mol.net, welcomes your
feedback! (Note: you will need to create a separate account there.)
Auroral classification ergonomics and the implications for machine learning
Geoscientific Instrumentation, Methods and Data Systems ( IF 1.8 ) Pub Date : 2020-07-09 , DOI: 10.5194/gi-9-267-2020 Derek McKay , Andreas Kvammen
Geoscientific Instrumentation, Methods and Data Systems ( IF 1.8 ) Pub Date : 2020-07-09 , DOI: 10.5194/gi-9-267-2020 Derek McKay , Andreas Kvammen
The machine-learning research community has focused greatly on bias in
algorithms and have identified different manifestations of it. Bias in
training samples is recognised as a potential source of prejudice
in machine learning. It can be introduced by the human experts who define
the training sets. As machine-learning techniques are being applied to
auroral classification, it is important to identify and address
potential sources of expert-injected bias. In an ongoing study,
13â947 auroral images were manually classified with significant
differences between classifications. This large dataset allowed for the
identification of some of these biases, especially those originating
as a result of the ergonomics of the classification process. These
findings are presented in this paper to serve as a checklist for
improving training data integrity, not just for expert
classifications, but also for crowd-sourced, citizen science
projects. As the application of machine-learning techniques to auroral
research is relatively new, it is important that biases are identified
and addressed before they become endemic in the corpus of training
data.
中文翻译:
极光分类人体工程学及其对机器学习的启示
机器学习研究社区非常关注算法中的偏差,并确定了它的不同表现形式。训练样本中的偏见被认为是机器学习中潜在的偏见源。可以由定义培训集的人类专家来介绍。随着机器学习技术被应用到极光分类中,重要的是识别和解决专家注入的偏见的潜在来源。在正在进行的研究中,手动对13â947个极光图像进行了分类,各分类之间存在显着差异。这个大数据集可以识别其中一些偏见,尤其是那些归因于分类过程的人机工程学的偏见。这些发现将在本文中介绍,以作为改进培训数据完整性的清单,不仅适用于专家分类,还适用于众包的公民科学项目。由于机器学习技术在极光研究中的应用相对较新,因此重要的是,在偏见成为训练数据语料库中的地方性偏见之前,先加以识别和解决。
更新日期:2020-08-20
中文翻译:
极光分类人体工程学及其对机器学习的启示
机器学习研究社区非常关注算法中的偏差,并确定了它的不同表现形式。训练样本中的偏见被认为是机器学习中潜在的偏见源。可以由定义培训集的人类专家来介绍。随着机器学习技术被应用到极光分类中,重要的是识别和解决专家注入的偏见的潜在来源。在正在进行的研究中,手动对13â947个极光图像进行了分类,各分类之间存在显着差异。这个大数据集可以识别其中一些偏见,尤其是那些归因于分类过程的人机工程学的偏见。这些发现将在本文中介绍,以作为改进培训数据完整性的清单,不仅适用于专家分类,还适用于众包的公民科学项目。由于机器学习技术在极光研究中的应用相对较新,因此重要的是,在偏见成为训练数据语料库中的地方性偏见之前,先加以识别和解决。