当前位置: X-MOL 学术ACS Eng. Au › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Assessment of Predicting Frontier Orbital Energies for Small Organic Molecules Using Knowledge-Based and Structural Information
ACS Engineering Au ( IF 4.3 ) Pub Date : 2022-04-22 , DOI: 10.1021/acsengineeringau.2c00011
Zong-Rong Ye, Sheng-Hsuan Hung, Berlin Chen, Ming-Kang Tsai

A systematic comparison is demonstrated for the predictions of frontier orbital energies─highest occupied molecular orbital (HOMO) (EH), lowest unoccupied molecular orbital (LUMO) (EL), and energy gap (ΔEHL) of the molecules in the QM9 dataset, where it contains 120k-plus three-dimensional organic molecule structures determined by first-principles simulations. The target molecular properties (EH, EL, and ΔEHL) are predicted using linear regression (LR), machine learning (random forest, RF), and continuous-filter convolutional neural network (SchNET) approaches. LR and RF models built upon various knowledge-based descriptors, being derived from SMILES of the molecules, can provide predictivity of the target properties with the mean absolute errors (MAEs) 4–6 times the chemical accuracy (0.043 eV). The best approach, SchNET, using the graph representation derived from molecular Cartesian coordinates, is confirmed to provide MAEs of EH, EL, and ΔEHL at 0.051, 0.041, and 0.076 eV, respectively. With the introduction of bond-step matrix representation with the SchNET model, the computational cost of dataset preparation can be substantially reduced, and the corresponding MAEs increase moderately to 2–3 times the chemical accuracy. The chemical interpretation of the important descriptors identified in the LR and RF models appears to align with the chemical knowledge of describing these molecular electronic properties but is accompanied with tolerable prediction errors. The combination of bond-step representation and the SchNET model can provide an assessable and balanced option for the high-throughput screening of organic molecules and the development of the data science approach.

中文翻译:

使用基于知识和结构信息的有机小分子预测前沿轨道能量的评估

系统比较了分子在前沿轨道能量的预测——最高占据分子轨道(HOMO)(E H)、最低未占据分子轨道(LUMO)(E L)和能隙(Δ E HL) QM9 数据集,其中包含由第一性原理模拟确定的 120k 多个三维有机分子结构。目标分子性质(E HE L和 Δ E HL) 使用线性回归 (LR)、机器学习(随机森林、RF)和连续滤波器卷积神经网络 (SchNET) 方法进行预测。LR 和 RF 模型建立在各种基于知识的描述符上,源自分子的 SMILES,可以提供目标特性的预测性,平均绝对误差 (MAE) 是化学精度 (0.043 eV) 的 4-6 倍。最好的方法,SchNET,使用从分子笛卡尔坐标派生的图形表示,被证实可以提供E HE L和 Δ E HL的 MAE分别为 0.051、0.041 和 0.076 eV。通过在 SchNET 模型中引入键步矩阵表示,可以显着降低数据集准备的计算成本,并且相应的 MAE 适度提高到化学精度的 2-3 倍。LR 和 RF 模型中确定的重要描述符的化学解释似乎与描述这些分子电子特性的化学知识一致,但伴随着可容忍的预测误差。键步表示和 SchNET 模型的结合可以为有机分子的高通量筛选和数据科学方法的发展提供可评估和平衡的选择。
更新日期:2022-04-22
down
wechat
bug