Abstract
Modern machine learning and autonomous experimentation schemes in materials science rely on accurate analysis of the data ingested by these models. Unfortunately, accurate analysis of the underlying data can be difficult, even for domain experts, complicating the training of the models intended to drive experiments. This is especially true when the goal is to identify the presence of weak signatures in diffraction or spectroscopic datasets. In this work, we examine a set of as-obtained diffraction data that track the phase transition from monoclinic to tetragonal in a Nb-doped VO2 film as a function of temperature and dopant concentration. We then task a set of domain experts and a set of machine learning experts with identifying which phase is present in each diffraction pattern manually and algorithmically, respectively; in both cases, the labels can vary dramatically, especially at the phase boundaries. We use the mode of the labels and the Shannon entropy as a method to capture, preserve and propagate consensus labels and their variance. Further we use the expert labels as a benchmark and demonstrate the use of Shannon entropy weighted scoring to test the performance of machine learning generated labels. Finally, we propose a material data challenge centered around generating improved labeling algorithms. This real-world dataset curated with expert labels can act as test bed for new algorithms. The raw data, annotations and code used in this study are all available online at data.gov and the interested reader is encouraged to replicate and improve the existing models
Similar content being viewed by others
Notes
Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.
The data and code for generating the figures has been uploaded to data.gov pending final approval for release. They have been provided as a ZIP file addendum to the manuscript.
References
Schmidt J, Marques MRG, Botti S, Marques MAL (2019) Recent advances and applications of machine learning in solid-state materials science. Npj Comput Mater 5:1–36. https://doi.org/10.1038/s41524-019-0221-0
Maksov A et al (2019) Deep learning analysis of defect and phase evolution during electron beam-induced transformations in WS2. Npj Comput Mater 5:1–8 https://doi.org/10.1038/s41524-019-0152-9
Zhang L, Lin DY, Wang H, Car R, Weinan E (2019) Active learning of uniformly accurate interatomic potentials for materials simulation. Phys Rev Mater 3:023804. https://doi.org/10.1103/PhysRevMaterials.3.023804
Li W, Field KG, Morgan D (2018) Automated defect analysis in electron microscopic images. Npj Comput Mater 4:36. https://doi.org/10.1038/s41524-018-0093-8
Aspuru-Guzik A, Persson KA (2018) Materials acceleration platform. Missi Innov - Innov Chall. 6
Montoya JH et al (2020) Autonomous intelligent agents for accelerated materials discovery. Chem Sci 11:8517–8532. https://doi.org/10.1039/d0sc01101k
MacLeod BP et al (2020) Self-driving laboratory for accelerated discovery of thin-film materials. Sci Adv 6:eaaz8867. https://doi.org/10.1126/sciadv.aaz8867
Gongora AE et al (2020) A Bayesian experimental autonomous researcher for mechanical design. Sci Adv 6:eaaz1708. https://doi.org/10.1126/sciadv.aaz1708
Nikolaev P et al (2016) Autonomy in materials research: a case study in carbon nanotube growth. Npj Comput Mater 2:16031. https://doi.org/10.1038/npjcompumats.2016.31
Cahn JW, Gratias D, Shechtman D (1986) Pauling’s model not universally accepted. Nature 319:102–103. https://doi.org/10.1038/319102a0
Brini E et al (2017) How water’s properties are encoded in its molecular structure and energies. Chem Rev 117:12385–12414. https://doi.org/10.1021/acs.chemrev.7b00259
Smart AG (2018) The war over supercooled water. Phys Today. https://doi.org/10.1063/pt.6.1.20180822a
Krause J et al (2018) Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmol 125:1264–1272. https://doi.org/10.1016/j.ophtha.2018.01.034
Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. ACM Press
Raykar VC et al (2010) Learning from crowds 1. supervised learning from multiple annotators/experts. J Mach Learn Res 11
Wauthier FL, Jordan MI (2011) Bayesian bias mitigation for crowdsourcing. In: Proceedings of the 24th international conference on neural information processing systems, pp 1–9
Kusne AG et al (2014) On-the-fly machine-learning for high-throughput experiments: Search for rare-earth-free permanent magnets. Sci Rep 4:1–7. https://doi.org/10.1038/srep06367
Noack MM et al (2019) A kriging-based approach to autonomous experimentation with applications to X-Ray scattering. Sci Rep 9:1–19. https://doi.org/10.1038/s41598-019-48114-3
Joress H et al (2020) A high-throughput structural and electrochemical study of metallic glass formation in Ni-Ti-Al. ACS Comb Sci 22:330–338. https://doi.org/10.1021/acscombsci.9b00215
Barron SC, Gorham JM, Patel MP, Green ML (2014) High-throughput measurements of thermochromic behavior in V 1–x Nb x O 2 combinatorial thin film libraries. ACS Comb Sci 16:526–534. https://doi.org/10.1021/co500064p
Bassim ND, Schenck PK, Otani M, Oguchi H (2007) Model, prediction, and experimental verification of composition and thickness in continuous spread thin film combinatorial libraries grown by pulsed laser deposition. Rev Sci Instrum 78:072203. https://doi.org/10.1063/1.2755783
Long CJ (2013) CombiView. https://sourceforge.net/projects/xrdsuite/
Buhmann M (2003) Radial basis functions: theory and implementations. Cambridge monographs on applied and computational mathematics. Cambridge University Press. https://doi.org/10.1017/CBO9780511543241.
Cosine Distance. https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/cosdist.htm.
Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Proceedings of the 17th international conference on neural information processing systems 1601–1608.
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd International conference on learning representations, ICLR 2014 - conference track proceedings (international conference on learning representations, ICLR)
Warwick MEA, Binions R (2014) Advances in thermochromic vanadium dioxide films. J Mater Chem A 2:3275–3292. https://doi.org/10.1039/c3ta14124a
Kozen AC et al (2017) Structural characterization of atomic layer deposited vanadium dioxide. J Phys Chem C 121:19341–19347. https://doi.org/10.1021/acs.jpcc.7b04682
Nishikawa M, Nakajima T, Kumagai T, Okutani T, Tsuchiya T (2011) Adjustment of thermal hysteresis in epitaxial VO2 films by doping metal ions. J Ceram Soc Japan 119:577–580
Gomez-Heredia CL et al (2019) Measurement of the hysteretic thermal properties of W-doped and undoped nanocrystalline powders of VO2. Sci Rep 9:1–14. https://doi.org/10.1038/s41598-019-51162-4
Miyazaki K, Shibuya K, Suzuki M, Wado H, Sawa A (2014) Correlation between thermal hysteresis width and broadening of metal-insulator transition in Cr- and Nb-doped VO2 films. Jpn J Appl Phys 53:71102. https://doi.org/10.7567/JJAP.53.071102
Liang YG et al (2020) Tuning the hysteresis of a metal-insulator transition via lattice compatibility. Nat Commun 11:1–8. https://doi.org/10.1038/s41467-020-17351-w
Acknowledgements
The authors thank Marcus Mendenhall and Kamal Choudhary for their careful reading of our manuscript and helpful suggestions for its improvement. We also kindly thank our anonymous reviewer who worked with us to improve the focus of the original manuscript. Diffraction measurements were performed at the University of Maryland X-ray Crystallographic Center. The work at the National Renewable Energy Laboratory (NREL), operated by Alliance for Sustainable Energy LLC for the US Department of Energy (DOE) under Contract No. DE-AC36-08GO28308, was funded by the Laboratory Directed Research and Development (LDRD) Program. The views expressed in the article do not necessarily represent the views of the DOE or the US Government.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hattrick-Simpers, J.R., DeCost, B., Kusne, A.G. et al. An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models. Integr Mater Manuf Innov 10, 311–318 (2021). https://doi.org/10.1007/s40192-021-00213-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40192-021-00213-8