当前位置: X-MOL 学术J. Comput. Aid. Mol. Des. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction.
Journal of Computer-Aided Molecular Design ( IF 3.0 ) Pub Date : 2020-01-20 , DOI: 10.1007/s10822-019-00274-0
Matthew C Robinson 1 , Robert C Glen 2, 3 , Alpha A Lee 1
Affiliation  

Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.



中文翻译:

验证验证:重新分析用于生物活性预测的深度学习和机器学习模型的大规模比较。

机器学习方法可能具有显着加速药物发现的潜力。然而,文献中发表的新方法论方法的增加提出了如何对模型进行基准测试和验证的基本问题。我们重新分析了最近发表的用于生物活性预测的机器学习模型的大规模比较所产生的数据,并得出了一些不同的结论。我们表明支持向量机的性能与深度学习方法的性能相比具有竞争力。此外,使用一系列数值实验,我们质疑接收者操作特征曲线下面积作为虚拟筛选指标的相关性。我们进一步建议精确召回曲线下的面积应与接受者操作特征曲线结合使用。我们的数值实验还突出了通过支架分割嵌套交叉验证来估计模型性能不确定性的挑战。

更新日期:2020-01-20
down
wechat
bug