当前位置: X-MOL 学术J. Cheminfom. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Prediction of small-molecule compound solubility in organic solvents by machine learning algorithms
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2021-12-11 , DOI: 10.1186/s13321-021-00575-3
Zhuyifan Ye 1 , Defang Ouyang 1
Affiliation  

Rapid solvent selection is of great significance in chemistry. However, solubility prediction remains a crucial challenge. This study aimed to develop machine learning models that can accurately predict compound solubility in organic solvents. A dataset containing 5081 experimental temperature and solubility data of compounds in organic solvents was extracted and standardized. Molecular fingerprints were selected to characterize structural features. lightGBM was compared with deep learning and traditional machine learning (PLS, Ridge regression, kNN, DT, ET, RF, SVM) to develop models for predicting solubility in organic solvents at different temperatures. Compared to other models, lightGBM exhibited significantly better overall generalization (logS ± 0.20). For unseen solutes, our model gave a prediction accuracy (logS ± 0.59) close to the expected noise level of experimental solubility data. lightGBM revealed the physicochemical relationship between solubility and structural features. Our method enables rapid solvent screening in chemistry and may be applied to solubility prediction in other solvents.

中文翻译:

通过机器学习算法预测小分子化合物在有机溶剂中的溶解度

快速溶剂选择在化学中具有重要意义。然而,溶解度预测仍然是一个关键的挑战。本研究旨在开发能够准确预测化合物在有机溶剂中的溶解度的机器学习模型。提取并标准化了包含 5081 个实验温度和化合物在有机溶剂中的溶解度数据的数据集。选择分子指纹来表征结构特征。将 lightGBM 与深度学习和传统机器学习(PLS、Ridge 回归、kNN、DT、ET、RF、SVM)进行比较,以开发预测不同温度下有机溶剂溶解度的模型。与其他模型相比,lightGBM 表现出明显更好的整体泛化(logS ± 0.20)。对于看不见的溶质,我们的模型给出了预测精度 (logS ± 0. 59) 接近实验溶解度数据的预期噪声水平。lightGBM 揭示了溶解度和结构特征之间的物理化学关系。我们的方法能够在化学中快速筛选溶剂,并可应用于其他溶剂中的溶解度预测。
更新日期:2021-12-11
down
wechat
bug