Abstract
New featurization schemes for describing materials as composition vectors in order to predict their properties using machine learning are common in the field of Materials Informatics. However, little is known about the comparative efficacy of these methods. This work sets out to make clear which featurization methods should be used across various circumstances. Our findings include, surprisingly, that simple fractional and random-noise representations of elements can be as effective as traditional and new descriptors when using large amounts of data. However, in the absence of large datasets or for data that is not fully representative, we show that the integration of domain knowledge offers advantages in predictive ability.
Graphical abstract
Similar content being viewed by others
References
Ward L, Agrawal A, Choudhary A, Wolverton C (2016) A general-purpose machine learning framework for predicting properties of inorganic materials. NPJ Comput Mater 2(1):1–7
Meredig B, Antono E, Church C, Hutchinson M, Ling J, Paradiso S, Blaiszik B, Foster I, Gibbons B, Hattrick-Simpers J, Mehta A, Ward L (2018) Can machine learning identify the next high-temperature superconductor? Examining extrapolation performance for materials discovery. Mol Syst Des Eng 3:819–825
Cao Z, Dan Y, Xiong Z, Niu C, Li X, Qian S, Hu J (2019) Convolutional neural networks for crystal material property prediction using hybrid orbital-field matrix and magpie descriptors. Crystals 9(4):191
Li X, Dan Y, Dong R, Cao Z, Niu C, Song Y, Li S, Hu J (2019) Computational screening of new perovskite materials using transfer learning and deep learning. Appl Sci 9(24):5510
Meredig B, Agrawal A, Kirklin S, Saal JE, Doak J, Thompson A, Zhang K, Choudhary A, Wolverton C (2014) Combinatorial screening for new materials in unconstrained composition space with machine learning. Phys Rev B 89(9):094104
Ramprasad R, Batra R, Pilania G, Mannodi-Kanakkithodi A, Kim C (2017) Machine learning in materials informatics: recent applications and prospects. NPJ Comput Mater 3(1):1–13
Gaultois MW, Oliynyk AO, Mar A, Sparks TD, Mulholland GJ, Meredig B (2016) Perspective: web-based machine learning models for real-time screening of thermoelectric materials properties. APL Mater 4(5):053213
Xie T, Grossman JC (2018) Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys Rev Lett 120:145301
Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z, Kononova O, Persson KA, Ceder G, Jain A (2019) Unsupervised word embeddings capture latent knowledge from materials science literature. Nature 571:95–98
Schütt KT, Kessel P, Gastegger M, Nicoli KA, Tkatchenko A, Müller K-R (2019) Schnetpack: a deep learning toolbox for atomistic systems. J Chem Theory Comput 15(1):448–455
Jha D, Ward L, Paul A, Liao W-K, Choudhary A, Wolverton C, Agrawal A (2018) Elemnet: deep learning the chemistry of materials from only elemental composition. Sci Rep 8(1):1–13
Meredig B (2019) Five high-impact research areas in machine learning for materials science. Chem Mater 31(23):9579–9581
Wagner N, Rondinelli JM (2016) Theory-guided machine learning in materials science. Front Mater 3:28
Ward L, Wolverton C (2017) Atomistic calculations and materials informatics: a review. Curr Opin Solid State Mater Sci 21(3):167–176
Choudhary K, DeCost B, Tavazza F (2018) Machine learning with force-field-inspired descriptors for materials: fast screening and mapping energy landscape. Phys Rev Mater 2:083801
Zhou Q, Tang P, Liu S, Pan J, Yan Q, Zhang S-C (2018) Learning atoms for materials discovery. Proc Natl Acad Sci 115(28):E6411–E6417
Oliynyk AO, Antono E, Sparks TD, Ghadbeigi L, Gaultois MW, Meredig B, Mar A (2016) High-throughput machine-learning-driven synthesis of full-Heusler compounds. Chem Mater 28(20):7324–7331
AFLOW (2018) AFLOW–automatic-flow for materials discovery. Accessed 14 July 2019
Bartel CJ, Trewartha A, Wang Q, Dunn A, Jain A, Ceder G (2020) A critical examination of compound stability predictions from machine-learned formation energies
Murdock RJ, Kauwe SK (2020) Online GitHub repository for Is domain knowledge necessary for machine learning material properties. https://github.com/rynmurdock/domain_knowledge
Kauwe SK, Graser J, Murdock R, Sparks TD (2020) Can machine learning find extraordinary materials? Comput Mater Sci 174:109498
Wang A, Kauwe S, Murdock R, Sparks T (2020) Compositionally-restricted attention-based network for materials property prediction. https://chemrxiv.org/articles/preprint/Compositionally-Restricted_Attention-Based_Network_for_Materials_Property_Prediction/11869026
Belviso F, Claerbout VEP, Comas-Vives A, Dalal NS, Fan FR, Filippetti A, Fiorentini V, Foppa L, Franchini C, Geisler B et al (2019) Viewpoint: atomic-scale design protocols toward energy, electronic, catalysis, and sensing applications. Inorg Chem 58(22):14939–14980
Clement CL, Kauwe SK, Sparks TD (2020) Benchmark AFLOW data sets for machine learning. Integr Mater Manuf Innov. https://doi.org/10.1007/s40192-020-00174-4
Dunn A, Wang Q, Ganose A, Dopp D, Jain A (2020) Benchmarking materials property prediction methods: the Matbench test set and automatminer reference algorithm. Accessed 5 May 2020
Ward L, Dunn A, Faghaninia A, Zimmermann N, Bajaj S, Wang Q, Montoya J, Chen J, Bystrom K, Dylla M, Chard K, Asta M, Persson K, Snyder G, Foster I, Jain A (2018) Matminer: an open source toolkit for materials data mining. Comput Mater Sci 152:60–69
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Acknowledgements
The authors gratefully acknowledge support from the NSF CAREER Award DMR 1651668. The authors also thank the Berlin International Graduate School in Model and Simulation based Research as well as the German Academic Exchange Service (Program No. 57438025) for their financial support. Special thanks are given to Dr. Aleksander Gurlo for advising Anthony Yu-Tung Wang and encouraging his collaborative stay at the University of Utah. The authors thank the creators of AFLOW for the creation of the database and for making the material properties available for this study. In addition, the authors express their gratitude to the open-source software community, for developing the excellent tools used in this research, including but not limited to Python, Pandas, NumPy, matplotlib, scikit-learn, and TensorFlow.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Rights and permissions
About this article
Cite this article
Murdock, R.J., Kauwe, S.K., Wang, A.YT. et al. Is Domain Knowledge Necessary for Machine Learning Materials Properties?. Integr Mater Manuf Innov 9, 221–227 (2020). https://doi.org/10.1007/s40192-020-00179-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40192-020-00179-z