Skip to main content
Log in

An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models

  • Technical Article
  • Published:
Integrating Materials and Manufacturing Innovation Aims and scope Submit manuscript

Abstract

Modern machine learning and autonomous experimentation schemes in materials science rely on accurate analysis of the data ingested by these models. Unfortunately, accurate analysis of the underlying data can be difficult, even for domain experts, complicating the training of the models intended to drive experiments. This is especially true when the goal is to identify the presence of weak signatures in diffraction or spectroscopic datasets. In this work, we examine a set of as-obtained diffraction data that track the phase transition from monoclinic to tetragonal in a Nb-doped VO2 film as a function of temperature and dopant concentration. We then task a set of domain experts and a set of machine learning experts with identifying which phase is present in each diffraction pattern manually and algorithmically, respectively; in both cases, the labels can vary dramatically, especially at the phase boundaries. We use the mode of the labels and the Shannon entropy as a method to capture, preserve and propagate consensus labels and their variance. Further we use the expert labels as a benchmark and demonstrate the use of Shannon entropy weighted scoring to test the performance of machine learning generated labels. Finally, we propose a material data challenge centered around generating improved labeling algorithms. This real-world dataset curated with expert labels can act as test bed for new algorithms. The raw data, annotations and code used in this study are all available online at data.gov and the interested reader is encouraged to replicate and improve the existing models

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Certain commercial equipment, instruments, or materials are identified in this paper in order to specify the experimental procedure adequately. Such identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the materials or equipment identified are necessarily the best available for the purpose.

  2. The data and code for generating the figures has been uploaded to data.gov pending final approval for release. They have been provided as a ZIP file addendum to the manuscript.

References

  1. Schmidt J, Marques MRG, Botti S, Marques MAL (2019) Recent advances and applications of machine learning in solid-state materials science. Npj Comput Mater 5:1–36. https://doi.org/10.1038/s41524-019-0221-0

  2. Maksov A et al (2019) Deep learning analysis of defect and phase evolution during electron beam-induced transformations in WS2. Npj Comput Mater 5:1–8 https://doi.org/10.1038/s41524-019-0152-9

  3. Zhang L, Lin DY, Wang H, Car R, Weinan E (2019) Active learning of uniformly accurate interatomic potentials for materials simulation. Phys Rev Mater 3:023804. https://doi.org/10.1103/PhysRevMaterials.3.023804

    Article  CAS  Google Scholar 

  4. Li W, Field KG, Morgan D (2018) Automated defect analysis in electron microscopic images. Npj Comput Mater 4:36. https://doi.org/10.1038/s41524-018-0093-8

  5. Aspuru-Guzik A, Persson KA (2018) Materials acceleration platform. Missi Innov - Innov Chall. 6

  6. Montoya JH et al (2020) Autonomous intelligent agents for accelerated materials discovery. Chem Sci 11:8517–8532. https://doi.org/10.1039/d0sc01101k

    Article  CAS  Google Scholar 

  7. MacLeod BP et al (2020) Self-driving laboratory for accelerated discovery of thin-film materials. Sci Adv 6:eaaz8867. https://doi.org/10.1126/sciadv.aaz8867

  8. Gongora AE et al (2020) A Bayesian experimental autonomous researcher for mechanical design. Sci Adv 6:eaaz1708. https://doi.org/10.1126/sciadv.aaz1708

  9. Nikolaev P et al (2016) Autonomy in materials research: a case study in carbon nanotube growth. Npj Comput Mater 2:16031. https://doi.org/10.1038/npjcompumats.2016.31

  10. Cahn JW, Gratias D, Shechtman D (1986) Pauling’s model not universally accepted. Nature 319:102–103. https://doi.org/10.1038/319102a0

    Article  CAS  Google Scholar 

  11. Brini E et al (2017) How water’s properties are encoded in its molecular structure and energies. Chem Rev 117:12385–12414. https://doi.org/10.1021/acs.chemrev.7b00259

    Article  CAS  Google Scholar 

  12. Smart AG (2018) The war over supercooled water. Phys Today. https://doi.org/10.1063/pt.6.1.20180822a

    Article  Google Scholar 

  13. Krause J et al (2018) Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmol 125:1264–1272. https://doi.org/10.1016/j.ophtha.2018.01.034

    Article  Google Scholar 

  14. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. ACM Press

  15. Raykar VC et al (2010) Learning from crowds 1. supervised learning from multiple annotators/experts. J Mach Learn Res 11

  16. Wauthier FL, Jordan MI (2011) Bayesian bias mitigation for crowdsourcing. In: Proceedings of the 24th international conference on neural information processing systems, pp 1–9

  17. Kusne AG et al (2014) On-the-fly machine-learning for high-throughput experiments: Search for rare-earth-free permanent magnets. Sci Rep 4:1–7. https://doi.org/10.1038/srep06367

    Article  CAS  Google Scholar 

  18. Noack MM et al (2019) A kriging-based approach to autonomous experimentation with applications to X-Ray scattering. Sci Rep 9:1–19. https://doi.org/10.1038/s41598-019-48114-3

    Article  CAS  Google Scholar 

  19. Joress H et al (2020) A high-throughput structural and electrochemical study of metallic glass formation in Ni-Ti-Al. ACS Comb Sci 22:330–338. https://doi.org/10.1021/acscombsci.9b00215

    Article  CAS  Google Scholar 

  20. Barron SC, Gorham JM, Patel MP, Green ML (2014) High-throughput measurements of thermochromic behavior in V 1–x Nb x O 2 combinatorial thin film libraries. ACS Comb Sci 16:526–534. https://doi.org/10.1021/co500064p

    Article  CAS  Google Scholar 

  21. Bassim ND, Schenck PK, Otani M, Oguchi H (2007) Model, prediction, and experimental verification of composition and thickness in continuous spread thin film combinatorial libraries grown by pulsed laser deposition. Rev Sci Instrum 78:072203. https://doi.org/10.1063/1.2755783

    Article  CAS  Google Scholar 

  22. Long CJ (2013) CombiView. https://sourceforge.net/projects/xrdsuite/

  23. Buhmann M (2003) Radial basis functions: theory and implementations. Cambridge monographs on applied and computational mathematics. Cambridge University Press. https://doi.org/10.1017/CBO9780511543241.

  24. Cosine Distance. https://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/cosdist.htm.

  25. Zelnik-Manor L, Perona P (2004) Self-tuning spectral clustering. In: Proceedings of the 17th international conference on neural information processing systems 1601–1608.

  26. Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: 2nd International conference on learning representations, ICLR 2014 - conference track proceedings (international conference on learning representations, ICLR)

  27. Warwick MEA, Binions R (2014) Advances in thermochromic vanadium dioxide films. J Mater Chem A 2:3275–3292. https://doi.org/10.1039/c3ta14124a

    Article  CAS  Google Scholar 

  28. Kozen AC et al (2017) Structural characterization of atomic layer deposited vanadium dioxide. J Phys Chem C 121:19341–19347. https://doi.org/10.1021/acs.jpcc.7b04682

    Article  CAS  Google Scholar 

  29. Nishikawa M, Nakajima T, Kumagai T, Okutani T, Tsuchiya T (2011) Adjustment of thermal hysteresis in epitaxial VO2 films by doping metal ions. J Ceram Soc Japan 119:577–580

    Article  CAS  Google Scholar 

  30. Gomez-Heredia CL et al (2019) Measurement of the hysteretic thermal properties of W-doped and undoped nanocrystalline powders of VO2. Sci Rep 9:1–14. https://doi.org/10.1038/s41598-019-51162-4

    Article  CAS  Google Scholar 

  31. Miyazaki K, Shibuya K, Suzuki M, Wado H, Sawa A (2014) Correlation between thermal hysteresis width and broadening of metal-insulator transition in Cr- and Nb-doped VO2 films. Jpn J Appl Phys 53:71102. https://doi.org/10.7567/JJAP.53.071102

    Article  CAS  Google Scholar 

  32. Liang YG et al (2020) Tuning the hysteresis of a metal-insulator transition via lattice compatibility. Nat Commun 11:1–8. https://doi.org/10.1038/s41467-020-17351-w

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors thank Marcus Mendenhall and Kamal Choudhary for their careful reading of our manuscript and helpful suggestions for its improvement. We also kindly thank our anonymous reviewer who worked with us to improve the focus of the original manuscript. Diffraction measurements were performed at the University of Maryland X-ray Crystallographic Center. The work at the National Renewable Energy Laboratory (NREL), operated by Alliance for Sustainable Energy LLC for the US Department of Energy (DOE) under Contract No. DE-AC36-08GO28308, was funded by the Laboratory Directed Research and Development (LDRD) Program. The views expressed in the article do not necessarily represent the views of the DOE or the US Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason R. Hattrick-Simpers.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hattrick-Simpers, J.R., DeCost, B., Kusne, A.G. et al. An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models. Integr Mater Manuf Innov 10, 311–318 (2021). https://doi.org/10.1007/s40192-021-00213-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40192-021-00213-8

Keywords

Navigation