当前位置: X-MOL 学术CrystEngComm › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning methods to predict the crystallization propensity of small organic molecules
CrystEngComm ( IF 2.6 ) Pub Date : 2020-03-26 , DOI: 10.1039/d0ce00070a
Florbela Pereira 1, 2, 3, 4, 5
Affiliation  

Machine learning (ML) algorithms were explored for the prediction of the crystallization propensity based on molecular descriptors and fingerprints generated from 2D chemical structures and 3D molecular descriptors from 3D chemical structures optimized with empirical methods. In total, 57 815 molecules were retrieved from the Reaxys® database, from those 53 998 molecules are recorded as crystalline (class A), 3097 as polymorphic (class B), and 720 as amorphous (class C). A training data set with 40 462 organic molecules was used to build the models, which were validated with an external test set comprising 17 353 organic molecules. Several ML algorithms such as random forest (RF), support vector machines (SVM), and deep learning multilayer perceptron networks (MLP) were screened. The best performance was achieved with a consensus classification model obtained by RF, SVM, and MLP models, which predicted the external test set with an overall predictive accuracy (Q) of up to 80%.

中文翻译:

机器学习方法来预测有机小分子的结晶倾向

探索了机器学习(ML)算法,用于基于分子描述符和2D化学结构生成的指纹以及通过经验方法优化的3D化学结构生成的3D分子描述符来预测结晶倾向。总共从数据库中检索了57815个分子,从这53998个分子中记录为结晶(A类),3097个多晶(B类)和720个非晶(C类)。使用包含40 462个有机分子的训练数据集来构建模型,并使用包含17 353个有机分子的外部测试集对模型进行了验证。筛选了几种ML算法,例如随机森林(RF),支持向量机(SVM)和深度学习多层感知器网络(MLP)。Q)高达80%。
更新日期:2020-03-26
down
wechat
bug