当前位置: X-MOL 学术Methods › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
TOP: A Deep Mixture Representation Learning Method for Boosting Molecular Toxicity Prediction
Methods ( IF 4.8 ) Pub Date : 2020-07-01 , DOI: 10.1016/j.ymeth.2020.05.013
Yuzhong Peng 1 , Ziqiao Zhang 2 , Qizhi Jiang 2 , Jihong Guan 3 , Shuigeng Zhou 2
Affiliation  

At the early stages of the drug discovery, molecule toxicity prediction is crucial to excluding drug candidates that are likely to fail in clinical trials. In this paper, we presented a novel molecular representation method and developed a corresponding deep learning-based framework called TOP (the abbreviation of TOxicity Prediction). TOP integrates specifically designed data preprocessing methods, an RNN based on bidirectional gated recurrent unit (BiGRU), and fully connected neural networks for end-to-end molecular representation learning and chemical toxicity prediction. TOP can automatically learn a mixed molecular representation from not only SMILES contextual information that describes the molecule structure, but also physiochemical properties. Therefore, TOP can overcome the drawbacks of existing methods that use either of them, thus greatly promotes toxicity prediction accuracy. We conducted extensive experiments over 14 classic toxicity prediction tasks on three different benchmark datasets, including balanced and imbalanced ones. The results show that, with the help of the novel molecular representation method, TOP significantly outperforms not only three baseline machine learning methods, but also five state-of-the-art methods.

中文翻译:

TOP:一种促进分子毒性预测的深度混合表示学习方法

在药物发现的早期阶段,分子毒性预测对于排除可能在临床试验中失败的候选药物至关重要。在本文中,我们提出了一种新的分子表示方法,并开发了相应的基于深度学习的框架,称为 TOP(TOxicity Prediction 的缩写)。TOP 集成了专门设计的数据预处理方法、基于双向门控循环单元 (BiGRU) 的 RNN,以及用于端到端分子表征学习和化学毒性预测的全连接神经网络。TOP 不仅可以从描述分子结构的 SMILES 上下文信息中自动学习混合分子表示,还可以从理化特性中学习。因此,TOP 可以克服使用其中任何一种的现有方法的缺点,从而大大提高了毒性预测的准确性。我们在三个不同的基准数据集上进行了超过 14 个经典毒性预测任务的广泛实验,包括平衡和不平衡数据集。结果表明,在新的分子表示方法的帮助下,TOP 不仅明显优于三种基线机器学习方法,而且还优于五种最先进的方法。
更新日期:2020-07-01
down
wechat
bug