当前位置: X-MOL 学术Cell Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design
Cell Systems ( IF 9.0 ) Pub Date : 2020-10-15 , DOI: 10.1016/j.cels.2020.09.007
Brian Hie 1 , Bryan D Bryson 2 , Bonnie Berger 3
Affiliation  

Machine learning that generates biological hypotheses has transformative potential, but most learning algorithms are susceptible to pathological failure when exploring regimes beyond the training data distribution. A solution to address this issue is to quantify prediction uncertainty so that algorithms can gracefully handle novel phenomena that confound standard methods. Here, we demonstrate the broad utility of robust uncertainty prediction in biological discovery. By leveraging Gaussian process-based uncertainty prediction on modern pre-trained features, we train a model on just 72 compounds to make predictions over a 10,833-compound library, identifying and experimentally validating compounds with nanomolar affinity for diverse kinases and whole-cell growth inhibition of Mycobacterium tuberculosis. Uncertainty facilitates a tight iterative loop between computation and experimentation and generalizes across biological domains as diverse as protein engineering and single-cell transcriptomics. More broadly, our work demonstrates that uncertainty should play a key role in the increasing adoption of machine learning algorithms into the experimental lifecycle.



中文翻译:

利用机器学习中的不确定性加速生物发现和设计

生成生物学假设的机器学习具有变革潜力,但大多数学习算法在探索超出训练数据分布的机制时容易出现病理性失败。解决这个问题的一个解决方案是量化预测的不确定性,以便算法可以优雅地处理混淆标准方法的新现象。在这里,我们展示了强大的不确定性预测在生物发现中的广泛用途。通过利用对现代预训练特征的基于高斯过程的不确定性预测,我们仅针对 72 种化合物训练模型,以对 10,833 种化合物库进行预测,识别和实验验证对多种激酶和全细胞生长抑制具有纳摩尔亲和力的化合物的结核分枝杆菌。不确定性促进了计算和实验之间的紧密迭代循环,并在蛋白质工程和单细胞转录组学等多种多样的生物领域中进行了概括。更广泛地说,我们的工作表明,不确定性应该在越来越多地采用机器学习算法到实验生命周期中发挥关键作用。

更新日期:2020-11-18
down
wechat
bug