Interactive visual labelling versus active learning: an experimental comparison

Chegini, Mohammad; Bernard, Jürgen; Cui, Jian; Chegini, Fatemeh; Sourin, Alexei; Andrews, Keith; Schreck, Tobias

doi:10.1631/FITEE.1900549

Interactive visual labelling versus active learning: an experimental comparison

Published: 30 April 2020

Volume 21, pages 524–535, (2020)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Mohammad Chegini ORCID: orcid.org/0000-0002-3516-8685^1,2,
Jürgen Bernard³,
Jian Cui²,
Fatemeh Chegini⁴,
Alexei Sourin²,
Keith Andrews⁵ &
…
Tobias Schreck¹

283 Accesses
11 Citations
Explore all metrics

Abstract

Methods from supervised machine learning allow the classification of new data automatically and are tremendously helpful for data analysis. The quality of supervised maching learning depends not only on the type of algorithm used, but also on the quality of the labelled dataset used to train the classifier. Labelling instances in a training dataset is often done manually relying on selections and annotations by expert analysts, and is often a tedious and time-consuming process. Active learning algorithms can automatically determine a subset of data instances for which labels would provide useful input to the learning process. Interactive visual labelling techniques are a promising alternative, providing effective visual overviews from which an analyst can simultaneously explore data records and select items to a label. By putting the analyst in the loop, higher accuracy can be achieved in the resulting classifier. While initial results of interactive visual labelling techniques are promising in the sense that user labelling can improve supervised learning, many aspects of these techniques are still largely unexplored. This paper presents a study conducted using the mVis tool to compare three interactive visualisations, similarity map, scatterplot matrix (SPLOM), and parallel coordinates, with each other and with active learning for the purpose of labelling a multivariate dataset. The results show that all three interactive visual labelling techniques surpass active learning algorithms in terms of classifier accuracy, and that users subjectively prefer the similarity map over SPLOM and parallel coordinates for labelling. Users also employ different labelling strategies depending on the visualisation used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Data Mining: A Comparative Analysis of Selected Datasets

EasySVM: A visual analysis approach for open-box support vector machines

Article Open access 15 March 2017

Self-service Data Classification Using Interactive Visualization and Interpretable Machine Learning

References

Attenberg J, Provost F, 2010. Inactive learning?: difficulties employing active learning in practice. ACM SIGKDD Explor Newslett, 12(2):36–41. https://doi.org/10.1145/1964897.1964906
Article Google Scholar
Bernard J, Hutter M, Zeppelzauer M, et al., 2018a. Comparing visual-interactive labeling with active learning: an experimental study. IEEE Trans Vis Comput Graph, 24(1):298–308. https://doi.org/10.1109/TVCG.2017.2744818
Article Google Scholar
Bernard J, Zeppelzauer M, Lehmann M, et al., 2018b. Towards user-centered active learning algorithms. Comput Graph Forum, 37(3):121–132. https://doi.org/10.1111/cgf.13406
Article Google Scholar
Bernard J, Zeppelzauer M, Sedlmair M, et al., 2018c. VIAL: a unified process for visual interactive labeling. Vis Comput, 34(9):1189–1207. https://doi.org/10.1007/s00371-018-1500-3
Article Google Scholar
Bishop CM, 2006. Pattern Recognition and Machine Learning. Springer, Berlin, Germany.
MATH Google Scholar
Ceneda D, Gschwandtner T, May T, et al., 2016. Characterizing guidance in visual analytics. IEEE Trans Vis Comput Graph, 23(1):111–120. https://doi.org/10.1109/TVCG.2016.2598468
Article Google Scholar
Chegini M, Shao L, Gregor R, et al., 2018. Interactive visual exploration of local patterns in large scatterplot spaces. Comput Graph Forum, 37(3):99–109. https://doi.org/10.1111/cgf.13404
Article Google Scholar
Chegini M, Bernard J, Berger P, et al., 2019a. Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning. Vis Inform, 3(1):9–17. https://doi.org/10.1016/j.visinf.2019.03.002
Article Google Scholar
Chegini M, Bernard J, Shao L, et al., 2019b. mVis in the wild: pre-study of an interactive visual machine learning system for labelling. IEEE Vis 2019 Workshop on Evaluation of Interactive Visual Machine Learning Systems, p.1–4.
Google Scholar
Chegini M, Sourin A, Andrews K, et al., 2019c. Eye-tracking based adaptive parallel coordinates. 12^th ACM SIGGRAPH Conf and Exhibition on Computer Graphics and Interactive Techniques in Asia, Article 44. https://doi.org/10.1145/3355056.3364563
Book Google Scholar
Culotta A, McCallum A, 2005. Reducing labeling effort for structured prediction tasks. National Conf on Artificial Intelligence, p.746–751.
Book Google Scholar
Hall M, Frank E, Holmes G, et al., 2009. The weka data mining software: an update. ACM SIGKDD Explor Newslett, 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Article Google Scholar
Heimerl F, Koch S, Bosch H, et al., 2012. Visual classifier training for text document retrieval. IEEE Trans Vis Comput Graph, 18(12):2839–2848. https://doi.org/10.1109/TVCG.2012.277
Article Google Scholar
Ho TK, 1995. Random decision forests. 3rd Int Conf on Document Analysis and Recognition, p.278–282. https://doi.org/10.1109/ICDAR.1995.598994
Google Scholar
Höferlin B, Netzel R, Höferlin M, et al., 2012. Inter-active learning of ad-hoc classifiers for video visual analytics. IEEE Conf on Visual Analytics Science and Technology, p.23–32. https://doi.org/10.1109/VAST.2012.6400492
Google Scholar
Inselberg A, 1985. The plane with parallel coordinates. Vis Comput, 1(2):69–91. https://doi.org/10.1007/BF01898350
Article MathSciNet MATH Google Scholar
Jolliffe I, 2002. Principal Component Analysis. Springer, New York, USA.
MATH Google Scholar
Kottke D, Calma A, Huseljic D, et al., 2017. Challenges of reliable, realistic and comparable active learning evaluation. Proc Interactive Adaptive Learning Workshop, p.1–14.
Google Scholar
Kruskal JB, 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27. https://doi.org/10.1007/BF02289565
Article MathSciNet MATH Google Scholar
LeCun Y, Bottou L, Bengio Y, et al., 1998. Gradient-based learning applied to document recognition. Proc IEEE, 86(11):2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
van der Maaten L, Hinton G, 2008. Visualizing data using t-SNE. J Mach Learn Res, 9(2018):2579–2605.
MATH Google Scholar
Scheffer T, Decomain C, Wrobel S, 2001. Active hidden Markov models for information extraction. Int Conf on Advances in Intelligent Data Analysis, p.309–318.
MATH Google Scholar
Schreck T, von Landesberger T, Bremm S, 2010. Techniques for precision-based visual analysis of projected data. Inform Vis, 9(3):181–193. https://doi.org/10.1057/ivs.2010.2
Article Google Scholar
Settles B, 2009. Active learning literature survey. Technical Report No. 1648, Department of Computer Sciences, University of Wisconsin-Madison, WI, USA.
Google Scholar
Settles B, Craven M, 2008. An analysis of active learning strategies for sequence labeling tasks. Proc Conf on Empirical Methods in Natural Language Processing, p.1070–1079.
Google Scholar
Shao L, Mahajan A, Schreck T, et al., 2017. Interactive regression lens for exploring scatter plots. Comput Graph Forum, 36(3):157–166. https://doi.org/10.1111/cgf.13176
Article Google Scholar
Wu Y, Kozintsev I, Bouguet JY, et al., 2006. Sampling strategies for active learning in personal photo retrieval. IEEE Int Conf on Multimedia and Expo, p.529–532. https://doi.org/10.1109/ICME.2006.262442
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Graphics and Knowledge Visualisation, Graz University of Technology, Graz, 8010, Austria
Mohammad Chegini & Tobias Schreck
School of Computer Science and Engineering, Nanyang Technological University, Singapore, 639798, Singapore
Mohammad Chegini, Jian Cui & Alexei Sourin
InfoVis Group, University of British Columbia, Vancouver, V6T1Z4, Canada
Jürgen Bernard
Max Planck Institute for Meteorology, Hamburg, 20146, Germany
Fatemeh Chegini
Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, 8010, Austria
Keith Andrews

Authors

Mohammad Chegini
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Bernard
View author publications
You can also search for this author in PubMed Google Scholar
Jian Cui
View author publications
You can also search for this author in PubMed Google Scholar
Fatemeh Chegini
View author publications
You can also search for this author in PubMed Google Scholar
Alexei Sourin
View author publications
You can also search for this author in PubMed Google Scholar
Keith Andrews
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Schreck
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mohammad CHEGINI implemented the mVis system and code necessary for the conduction of the study. Mohammad CHEGINI and Jürgen BERNARD designed the study. Mohammad CHEGINI and Fatemeh CHEGINI drafted the manuscript. Jian CUI helped conduct the experiment and data processing. Alexei SOURIN, Keith ANDREWS, and Tobias SCHRECK contributed to the definition of the underlying research questions, and they revised and finalized the manuscript.

Corresponding author

Correspondence to Mohammad Chegini.

Additional information

Compliance with ethics guidelines

Mohammad CHEGINI, Jürgen BERNARD, Jian CUI, Fatemeh CHEGINI, Alexei SOURIN, Keith ANDREWS, and Tobias SCHRECK declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chegini, M., Bernard, J., Cui, J. et al. Interactive visual labelling versus active learning: an experimental comparison. Front Inform Technol Electron Eng 21, 524–535 (2020). https://doi.org/10.1631/FITEE.1900549

Download citation

Received: 06 October 2019
Accepted: 17 January 2020
Published: 30 April 2020
Issue Date: April 2020
DOI: https://doi.org/10.1631/FITEE.1900549

Keywords

CLC number

TP311

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interactive visual labelling versus active learning: an experimental comparison

Abstract

Access this article

Similar content being viewed by others

Visual Data Mining: A Comparative Analysis of Selected Datasets

EasySVM: A visual analysis approach for open-box support vector machines

Self-service Data Classification Using Interactive Visualization and Interpretable Machine Learning

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Keywords

CLC number

Navigation

Interactive visual labelling versus active learning: an experimental comparison

Abstract

Access this article

Similar content being viewed by others

Visual Data Mining: A Comparative Analysis of Selected Datasets

EasySVM: A visual analysis approach for open-box support vector machines

Self-service Data Classification Using Interactive Visualization and Interpretable Machine Learning

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Keywords

CLC number

Search

Navigation