当前位置: X-MOL 学术Nat. Commun. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Representation of molecular structures with persistent homology for machine learning applications in chemistry.
Nature Communications ( IF 14.7 ) Pub Date : 2020-06-26 , DOI: 10.1038/s41467-020-17035-5
Jacob Townsend 1 , Cassie Putman Micucci 2 , John H Hymel 1 , Vasileios Maroulas 2 , Konstantinos D Vogiatzis 1
Affiliation  

Machine learning and high-throughput computational screening have been valuable tools in accelerated first-principles screening for the discovery of the next generation of functionalized molecules and materials. The application of machine learning for chemical applications requires the conversion of molecular structures to a machine-readable format known as a molecular representation. The choice of such representations impacts the performance and outcomes of chemical machine learning methods. Herein, we present a new concise molecular representation derived from persistent homology, an applied branch of mathematics. We have demonstrated its applicability in a high-throughput computational screening of a large molecular database (GDB-9) with more than 133,000 organic molecules. Our target is to identify novel molecules that selectively interact with CO2. The methodology and performance of the novel molecular fingerprinting method is presented and the new chemically-driven persistence image representation is used to screen the GDB-9 database to suggest molecules and/or functional groups with enhanced properties.



中文翻译:


具有持久同源性的分子结构表示,用于化学中的机器学习应用。



机器学习和高通量计算筛选是加速第一原理筛选以发现下一代功能化分子和材料的宝贵工具。机器学习在化学应用中的应用需要将分子结构转换为机器可读的格式(称为分子表示)。此类表示的选择会影响化学机器学习方法的性能和结果。在这里,我们提出了一种源自持久同源性(数学的应用分支)的新的简明分子表示。我们已经证明了其在包含超过 133,000 个有机分子的大型分子数据库 (GDB-9) 的高通量计算筛选中的适用性。我们的目标是识别选择性与CO 2相互作用的新型分子。介绍了新型分子指纹识别方法的方法和性能,并使用新的化学驱动的持久性图像表示来筛选 GDB-9 数据库,以建议具有增强特性的分子和/或官能团。

更新日期:2020-06-26
down
wechat
bug