当前位置: X-MOL 学术npj Comput. Mater. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Semi-supervised machine-learning classification of materials synthesis procedures
npj Computational Materials ( IF 9.4 ) Pub Date : 2019-07-08 , DOI: 10.1038/s41524-019-0204-1
Haoyan Huo , Ziqin Rong , Olga Kononova , Wenhao Sun , Tiago Botari , Tanjin He , Vahe Tshitoyan , Gerbrand Ceder

Digitizing large collections of scientific literature can enable new informatics approaches for scientific analysis and meta-analysis. However, most content in the scientific literature is locked-up in written natural language, which is difficult to parse into databases using explicitly hard-coded classification rules. In this work, we demonstrate a semi-supervised machine-learning method to classify inorganic materials synthesis procedures from written natural language. Without any human input, latent Dirichlet allocation can cluster keywords into topics corresponding to specific experimental materials synthesis steps, such as “grinding” and “heating”, “dissolving” and “centrifuging”, etc. Guided by a modest amount of annotation, a random forest classifier can then associate these steps with different categories of materials synthesis, such as solid-state or hydrothermal synthesis. Finally, we show that a Markov chain representation of the order of experimental steps accurately reconstructs a flowchart of possible synthesis procedures. Our machine-learning approach enables a scalable approach to unlock the large amount of inorganic materials synthesis information from the literature and to process it into a standardized, machine-readable database.



中文翻译:

材料加工程序的半监督机器学习分类

对大量科学文献进行数字化处理可以为科学分析和荟萃分析提供新的信息学方法。但是,科学文献中的大多数内容都是用书面的自然语言锁定的,因此很难使用明确的硬编码分类规则将其解析为数据库。在这项工作中,我们演示了一种半监督的机器学习方法,用于根据书面自然语言对无机材料的合成程序进行分类。无需任何人工输入,潜在的Dirichlet分配就可以将关键字聚类到与特定实验材料合成步骤相对应的主题,例如“研磨”和“加热”,“溶解”和“离心分离”等。然后,随机森林分类器可以将这些步骤与材料合成的不同类别相关联,例如固态或水热合成。最后,我们表明实验步骤顺序的马尔可夫链表示可准确地重建可能的合成程序的流程图。我们的机器学习方法提供了一种可扩展的方法,可以从文献中解锁大量的无机材料合成信息,并将其处理为标准化的机器可读数据库。

更新日期:2019-11-18
down
wechat
bug