Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network,Journal of Cheminformatics

当前位置： X-MOL 学术 › J. Cheminfom. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network
Journal of Cheminformatics ( IF 7.1 ) Pub Date : 2021-11-27 , DOI: 10.1186/s13321-021-00570-8
Jiarui Chen ₁ , Yain-Whar Si ₁ , Chon-Wai Un ₁ , Shirley W I Siu _{1,

2,

3}

Affiliation

As safety is one of the most important properties of drugs, chemical toxicology prediction has received increasing attentions in the drug discovery research. Traditionally, researchers rely on in vitro and in vivo experiments to test the toxicity of chemical compounds. However, not only are these experiments time consuming and costly, but experiments that involve animal testing are increasingly subject to ethical concerns. While traditional machine learning (ML) methods have been used in the field with some success, the limited availability of annotated toxicity data is the major hurdle for further improving model performance. Inspired by the success of semi-supervised learning (SSL) algorithms, we propose a Graph Convolution Neural Network (GCN) to predict chemical toxicity and trained the network by the Mean Teacher (MT) SSL algorithm. Using the Tox21 data, our optimal SSL-GCN models for predicting the twelve toxicological endpoints achieve an average ROC-AUC score of 0.757 in the test set, which is a 6% improvement over GCN models trained by supervised learning and conventional ML methods. Our SSL-GCN models also exhibit superior performance when compared to models constructed using the built-in DeepChem ML methods. This study demonstrates that SSL can increase the prediction power of models by learning from unannotated data. The optimal unannotated to annotated data ratio ranges between 1:1 and 4:1. This study demonstrates the success of SSL in chemical toxicity prediction; the same technique is expected to be beneficial to other chemical property prediction tasks by utilizing existing large chemical databases. Our optimal model SSL-GCN is hosted on an online server accessible through: https://app.cbbio.online/ssl-gcn/home .

中文翻译：

基于半监督学习和图卷积神经网络的化学品毒性预测

由于安全性是药物最重要的特性之一，化学毒理学预测在药物发现研究中受到越来越多的关注。传统上，研究人员依靠体外和体内实验来测试化合物的毒性。然而，这些实验不仅耗时且成本高昂，而且涉及动物测试的实验也越来越受到伦理问题的影响。虽然传统的机器学习 (ML) 方法已在该领域使用并取得了一些成功，但带注释的毒性数据的有限可用性是进一步提高模型性能的主要障碍。受半监督学习（SSL）算法成功的启发，我们提出了一种图卷积神经网络（GCN）来预测化学毒性，并通过平均教师（MT）SSL 算法训练网络。使用 Tox21 数据，我们用于预测 12 个毒理学终点的最佳 SSL-GCN 模型在测试集中达到了 0.757 的平均 ROC-AUC 分数，这比通过监督学习和传统 ML 方法训练的 GCN 模型提高了 6%。与使用内置 DeepChem ML 方法构建的模型相比，我们的 SSL-GCN 模型还表现出卓越的性能。这项研究表明 SSL 可以通过从未注释的数据中学习来提高模型的预测能力。最佳的未注释数据与注释数据比率范围在 1:1 到 4:1 之间。这项研究证明了 SSL 在化学毒性预测方面的成功；通过利用现有的大型化学数据库，相同的技术预计将有利于其他化学性质预测任务。我们的最佳模型 SSL-GCN 托管在可通过以下方式访问的在线服务器上： https://app.cbbio.online/ssl-gcn/home 。

更新日期：2021-11-27

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南11