当前位置: X-MOL 学术Chem. Eng. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Chemical space deconstruction-based dynamic model ensemble architecture for molecular property prediction
Chemical Engineering Science ( IF 4.7 ) Pub Date : 2024-04-10 , DOI: 10.1016/j.ces.2024.120118
Huaqiang Wen , Shihao Nan , Jun Zhang , Zhigang Lei , Weifeng Shen

Green solvents, catalysts, functional materials, drugs, and other chemical products now have a quick engine for design thanks to machine learning (ML) based prediction of molecular properties. However, the accuracy and stability of ML-based models can be impeded by poor data quality, which is rarely studied in chemical product discovery and design. Inspired by the dynamic ensemble selection (DES), an improved DES based on chemical space deconstruction is proposed in this work to accommodate the prediction task of molecular properties. We innovatively developed a chemical space representation and deconstruction model based on a self-organizing mapping (SOM) neural network, facilitating the rapid implementation of the improved DES on molecular samples. Consequently, a novel dynamic model ensemble architecture (SOM-DES) is proposed as a model enhancement technology to build a more accurate and stable ensemble model, aiming to improve the predictive performance on the chemical subspace within poor-quality data. To achieve the architecture, a supervised dimensionality reduction algorithm has been improved to enhance the deep mining of molecular feature information for DES optimization. Additionally, a novel resampling strategy based on the combination of the geometric synthetic minority oversampling technique (G-SMOTE) algorithm and chemical space deconstruction, as a data augmentation technology, has been proposed for mitigating the disadvantage of unbalanced data during DES training. The prediction task for enthalpy of formation of the ideal gas is applied as a case study to demonstrate the superiority of the proposed SOM-DES. The results indicate that the proposed SOM-DES (R = 0.9731, RMSE = 55.4639) outperforms the traditional static ensemble strategy (SES, R = 0.9552, RMSE = 71.5045) in terms of global chemical spatial precision. More importantly, for chemical subspaces that are difficult to predict due to low data quality, SOM-DES shows a significant reduction in prediction errors compared to SES.

中文翻译:

基于化学空间解构的分子性质预测动态模型系综架构

得益于基于机器学习 (ML) 的分子特性预测,绿色溶剂、催化剂、功能材料、药物和其他化学产品现在拥有快速设计引擎。然而,基于机器学习的模型的准确性和稳定性可能会因数据质量差而受到阻碍,这在化学产品发现和设计中很少被研究。受动态系综选择(DES)的启发,本文提出了一种基于化学空间解构的改进DES,以适应分子性质的预测任务。我们创新性地开发了基于自组织映射(SOM)神经网络的化学空间表示和解构模型,有助于在分子样品上快速实施改进的DES。因此,提出了一种新颖的动态模型集成架构(SOM-DES)作为模型增强技术,以构建更准确、稳定的集成模型,旨在提高质量较差的数据中化学子空间的预测性能。为了实现该架构,改进了监督降维算法,以增强分子特征信息的深度挖掘以进行 DES 优化。此外,作为一种数据增强技术,提出了一种基于几何合成少数过采样技术(G-SMOTE)算法和化学空间解构相结合的新型重采样策略,以缓解 DES 训练期间数据不平衡的缺点。以理想气体形成焓的预测任务作为案例研究来证明所提出的 SOM-DES 的优越性。结果表明,所提出的 SOM-DES(R = 0.9731,RMSE = 55.4639)在全局化学空间精度方面优于传统的静态集成策略(SES,R = 0.9552,RMSE = 71.5045)。更重要的是,对于由于数据质量低而难以预测的化学子空间,SOM-DES 与 SES 相比,预测误差显着减少。
更新日期:2024-04-10
down
wechat
bug