当前位置: X-MOL 学术Acc. Chem. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Design-to-Device Pipeline for Data-Driven Materials Discovery.
Accounts of Chemical Research ( IF 18.3 ) Pub Date : 2020-02-25 , DOI: 10.1021/acs.accounts.9b00470
Jacqueline M Cole 1, 2, 3, 4
Affiliation  

The world needs new materials to stimulate the chemical industry in key sectors of our economy: environment and sustainability, information storage, optical telecommunications, and catalysis. Yet, nearly all functional materials are still discovered by "trial-and-error", of which the lack of predictability affords a major materials bottleneck to technological innovation. The average "molecule-to-market" lead time for materials discovery is currently 20 years. This is far too long for industrial needs, as highlighted by the Materials Genome Initiative, which has ambitious targets of up to 4-fold reductions in average molecule-to-market lead times. Such a large step change in progress can only be realistically achieved if one adopts an entirely new approach to materials discovery. Fortunately, a fundamentally new approach to materials discovery has been emerging, whereby data science with artificial intelligence offers a prospective solution to speed up these average molecule-to-market lead times.This approach is known as data-driven materials discovery. Its broad prospects have only recently become a reality, given the timely and major advances in "big data", artificial intelligence, and high-performance computing (HPC). Access to massive data sets has been stimulated by government-regulated open-access requirements for data and literature. Natural-language processing (NLP) and machine-learning (ML) tools that can mine data and find patterns therein are becoming mainstream. Exascale HPC capabilities that can aid data mining and pattern recognition and also generate their own data from calculations are now within our grasp. These timely advances present an ideal opportunity to develop data-driven materials-discovery strategies to systematically design and predict new chemicals for a given device application.This Account shows how data science can afford materials discovery via a four-step "design-to-device" pipeline that entails (1) data extraction, (2) data enrichment, (3) material prediction, and (4) experimental validation. Massive databases of cognate chemical and property information are first forged from "chemistry-aware" natural-language-processing tools, such as ChemDataExtractor, and enriched using machine-learning methods and high-throughput quantum-chemical calculations. New materials for a bespoke application can then be predicted by mining these databases with algorithmic encodings of relationships between chemical structures and physical properties that are known to deliver functional materials. These may take the form of classification, enumeration, or machine-learning algorithms. A data-mining workflow short-lists these predictions to a handful of lead candidate materials that go forward to experimental validation. This design-to-device approach is being developed to offer a roadmap for the accelerated discovery of new chemicals for functional applications. Case studies presented demonstrate its utility for photovoltaic, optical, and catalytic applications. While this Account is focused on applications in the physical sciences, the generic pipeline discussed is readily transferable to other scientific disciplines such as biology and medicine.

中文翻译:

用于数据驱动材料发现的设计到设备管道。

世界需要新材料来刺激我们经济中的关键部门的化学工业:环境与可持续性,信息存储,光通信和催化。然而,几乎所有功能材料仍然是通过“反复试验”发现的,其中缺乏可预测性使技术创新成为主要的材料瓶颈。材料发现的平均“分子到市场”交货时间为20年。正如材料基因组计划所强调的那样,对于工业需求而言这太长了,该计划的雄心勃勃的目标是将平均分子市场交货时间缩短多达4倍。只有采用一种全新的材料发现方法,才能实现如此巨大的进步。幸好,已经出现了一种从根本上新的材料发现方法,其中具有人工智能的数据科学提供了一种前瞻性的解决方案,可以加快这些平均分子到市场的交货时间,这种方法被称为数据驱动的材料发现。鉴于“大数据”,人工智能和高性能计算(HPC)的及时和重大进步,其广阔的前景直到最近才成为现实。政府对数据和文献的开放访问要求刺激了对海量数据集的访问。可以挖掘数据并在其中找到模式的自然语言处理(NLP)和机器学习(ML)工具正成为主流。Exascale HPC功能不仅可以帮助数据挖掘和模式识别,还可以通过计算生成自己的数据,现在已经掌握了。这些及时的进展为开发数据驱动的材料发现策略提供了理想的机会,从而可以针对给定的设备应用系统地设计和预测新化学物质。该帐户说明了数据科学如何通过四步“从设计到设备”来进行材料发现管道需要(1)数据提取,(2)数据丰富,(3)材料预测和(4)实验验证。首先从“化学感知”的自然语言处理工具(例如ChemDataExtractor)中建立大量的同类化学和特性信息数据库,然后使用机器学习方法和高通量量子化学计算进行丰富。然后,可以通过使用已知传递功能性材料的化学结构与物理性质之间的关系的算法编码来挖掘这些数据库,从而预测用于定制应用程序的新材料。这些可以采取分类,枚举或机器学习算法的形式。数据挖掘工作流程将这些预测短名单化为少量的潜在候选材料,这些材料将继续进行实验验证。正在开发这种从设备设计的方法,为加快发现用于功能应用的新化学品提供了路线图。提出的案例研究证明了其在光伏,光学和催化应用中的实用性。尽管此帐户着重于物理科学中的应用,
更新日期:2020-02-25
down
wechat
bug