当前位置: X-MOL 学术Front. Inform. Technol. Electron. Eng. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Interactive visual labelling versus active learning: an experimental comparison
Frontiers of Information Technology & Electronic Engineering ( IF 2.7 ) Pub Date : 2020-04-30 , DOI: 10.1631/fitee.1900549
Mohammad Chegini , Jürgen Bernard , Jian Cui , Fatemeh Chegini , Alexei Sourin , Keith Andrews , Tobias Schreck

Methods from supervised machine learning allow the classification of new data automatically and are tremendously helpful for data analysis. The quality of supervised maching learning depends not only on the type of algorithm used, but also on the quality of the labelled dataset used to train the classifier. Labelling instances in a training dataset is often done manually relying on selections and annotations by expert analysts, and is often a tedious and time-consuming process. Active learning algorithms can automatically determine a subset of data instances for which labels would provide useful input to the learning process. Interactive visual labelling techniques are a promising alternative, providing effective visual overviews from which an analyst can simultaneously explore data records and select items to a label. By putting the analyst in the loop, higher accuracy can be achieved in the resulting classifier. While initial results of interactive visual labelling techniques are promising in the sense that user labelling can improve supervised learning, many aspects of these techniques are still largely unexplored. This paper presents a study conducted using the mVis tool to compare three interactive visualisations, similarity map, scatterplot matrix (SPLOM), and parallel coordinates, with each other and with active learning for the purpose of labelling a multivariate dataset. The results show that all three interactive visual labelling techniques surpass active learning algorithms in terms of classifier accuracy, and that users subjectively prefer the similarity map over SPLOM and parallel coordinates for labelling. Users also employ different labelling strategies depending on the visualisation used.



中文翻译:

交互式视觉标签与主动学习:实验比较

有监督的机器学习方法可以自动对新数据进行分类,对数据分析非常有帮助。监督式学习的质量不仅取决于所用算法的类型,还取决于用于训练分类器的标记数据集的质量。在训练数据集中标记实例通常是依靠专家分析人员的选择和注释手动完成的,并且通常是一个繁琐且耗时的过程。主动学习算法可以自动确定数据实例的子集,其标签将为学习过程提供有用的输入。交互式视觉标签技术是一种很有前途的替代方法,它提供了有效的视觉概述,分析人员可以从中同时浏览数据记录并选择标签项目。通过将分析人员置于循环中,可以在最终的分类器中实现更高的准确性。尽管从用户标记可以改善监督学习的意义上讲,交互式视觉标记技术的初步结果令人鼓舞,但这些技术的许多方面仍未得到开发。本文提出了一项使用mVis工具进行的研究,以比较三种交互式可视化效果,相似度图,散点图矩阵(SPLOM)和平行坐标,并相互进行了主动学习,以标记多变量数据集。结果表明,在分类器准确性方面,所有三种交互式视觉标记技术都超过了主动学习算法,并且用户主观上更喜欢使用相似度图而不是SPLOM和平行坐标进行标记。

更新日期:2020-04-30
down
wechat
bug