当前位置: X-MOL 学术Geosci. Instrum. Method. Data Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Auroral classification ergonomics and the implications for machine learning
Geoscientific Instrumentation, Methods and Data Systems ( IF 1.8 ) Pub Date : 2020-07-09 , DOI: 10.5194/gi-9-267-2020
Derek McKay , Andreas Kvammen

The machine-learning research community has focused greatly on bias in algorithms and have identified different manifestations of it. Bias in training samples is recognised as a potential source of prejudice in machine learning. It can be introduced by the human experts who define the training sets. As machine-learning techniques are being applied to auroral classification, it is important to identify and address potential sources of expert-injected bias. In an ongoing study, 13 947 auroral images were manually classified with significant differences between classifications. This large dataset allowed for the identification of some of these biases, especially those originating as a result of the ergonomics of the classification process. These findings are presented in this paper to serve as a checklist for improving training data integrity, not just for expert classifications, but also for crowd-sourced, citizen science projects. As the application of machine-learning techniques to auroral research is relatively new, it is important that biases are identified and addressed before they become endemic in the corpus of training data.

中文翻译:

极光分类人体工程学及其对机器学习的启示

机器学习研究社区非常关注算法中的偏差,并确定了它的不同表现形式。训练样本中的偏见被认为是机器学习中潜在的偏见源。可以由定义培训集的人类专家来介绍。随着机器学习技术被应用到极光分类中,重要的是识别和解决专家注入的偏见的潜在来源。在正在进行的研究中,手动对13â947个极光图像进行了分类,各分类之间存在显着差异。这个大数据集可以识别其中一些偏见,尤其是那些归因于分类过程的人机工程学的偏见。这些发现将在本文中介绍,以作为改进培训数据完整性的清单,不仅适用于专家分类,还适用于众包的公民科学项目。由于机器学习技术在极光研究中的应用相对较新,因此重要的是,在偏见成为训练数据语料库中的地方性偏见之前,先加以识别和解决。
更新日期:2020-08-20
down
wechat
bug