Chemical Engineering Science ( IF 4.7 ) Pub Date : 2022-07-23 , DOI: 10.1016/j.ces.2022.117946 Yiming Ma , Yue Niu , Huaiyu Yang , Jiayu Dai , Jiawei Lin , Huiqi Wang , Songgu Wu , Qiuxiang Yin , Ling Zhou , Junbo Gong
This study reports a machine-learning (ML) method to develop multi-purpose prediction strategies for the formation of cyclodextrin inclusion complexes (ICs) in aqueous solutions. A balanced dataset of pharmaceutically relevant molecules was constructed using experimental verification. Three ML models (artificial neural network, support vector machine, and logistic regression) were established and optimized to predict IC formation. To provide more reliable approaches for different prediction requirements, ML-based linear, recall-first, and precision-first strategies were further established based on the ML models for the maximum recall or precision values. The proposed recall-first strategy identified all positive samples to avoid missing data in the prediction, and the precision-first strategy accurately identified positive samples to reduce the number of validation experiments. The ML-based prediction strategies for IC formation were first established and showed high accuracy and reliability. These strategies provide higher efficiency and lower processing cost solutions for IC screening.
中文翻译:
通过基于机器学习的策略预测和设计环糊精包合物的形成
本研究报告了一种机器学习 (ML) 方法,用于开发用于在水溶液中形成环糊精包合物 (IC) 的多用途预测策略。使用实验验证构建了药学相关分子的平衡数据集。建立并优化了三个 ML 模型(人工神经网络、支持向量机和逻辑回归)以预测 IC 形成。为了针对不同的预测要求提供更可靠的方法,基于 ML 模型的最大召回或精度值进一步建立了基于 ML 的线性、召回优先和精度优先策略。提出的召回优先策略识别所有正样本以避免在预测中丢失数据,精度优先策略准确识别阳性样本,减少验证实验次数。首次建立了基于 ML 的 IC 形成预测策略,并显示出较高的准确性和可靠性。这些策略为 IC 筛选提供了更高效率和更低处理成本的解决方案。