Cell Reports Physical Science ( IF 8.9 ) Pub Date : 2021-09-13 , DOI: 10.1016/j.xcrp.2021.100573 Kuan Lee, Ann Yang, Yen-Chu Lin, Daniel Reker, Gonçalo J.L. Bernardes, Tiago Rodrigues
Biological screens are plagued by false-positive hits resulting from aggregation. Methods to triage small colloidally aggregating molecules (SCAMs) are in high demand. Herein, we disclose a neural network to flag such entities. Our data demonstrate the utility of machine learning for predicting SCAMs, achieving 80% of correct predictions in an out-of-sample evaluation. The tool is competitive with a panel of expert chemists, who correctly predict 61% ± 7% of the same molecules in a Turing-like test. Our computational routine provides insight into features governing aggregation that had remained hidden to expert intuition. Further, we quantify that up to 15%–20% of ligands in publicly available chemogenomic databases have high potential to aggregate at a typical screening concentration (30 μM), imposing caution in systems biology and drug design programs. Our approach provides a means to augment human intuition and mitigate attrition and a pathway to accelerate future molecular medicine.
中文翻译:
用机器学习对抗小分子聚集
生物筛选受到聚集导致的假阳性命中的困扰。对小胶体聚集分子 (SCAM) 进行分类的方法需求量很大。在此,我们公开了一种神经网络来标记此类实体。我们的数据证明了机器学习在预测 SCAM 方面的实用性,在样本外评估中实现了 80% 的正确预测。该工具与一组专家化学家竞争,他们在类似图灵的测试中正确预测了 61% ± 7% 的相同分子。我们的计算例程提供了对专家直觉隐藏的控制聚合的特征的洞察。此外,我们量化了公开可用的化学基因组学数据库中多达 15%–20% 的配体具有在典型筛选浓度 (30 μM) 下聚集的高潜力,在系统生物学和药物设计程序中施加谨慎。我们的方法提供了一种增强人类直觉和减轻损耗的方法,以及一种加速未来分子医学的途径。