In silico prediction of chemical genotoxicity using machine learning methods and structural alerts†,Toxicology Research

当前位置： X-MOL 学术 › Toxicol. Res. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

In silico prediction of chemical genotoxicity using machine learning methods and structural alerts†
Toxicology Research ( IF 2.1 ) Pub Date : 2017-12-15 00:00:00 , DOI: 10.1039/c7tx00259a
Defang Fan _{1,

2,

3,

4,

5} , Hongbin Yang _{1,

2,

3,

4,

5} , Fuxing Li _{1,

2,

3,

4,

5} , Lixia Sun _{1,

2,

3,

4,

5} , Peiwen Di _{1,

2,

3,

4,

5} , Weihua Li _{1,

2,

3,

4,

5} , Yun Tang _{1,

2,

3,

4,

5} , Guixia Liu _{1,

2,

3,

4,

5}

Affiliation

Genotoxicity tests can detect compounds that have an adverse effect on the process of heredity. The in vivo micronucleus assay, a genotoxicity test method, has been widely used to evaluate the presence and extent of chromosomal damage in human beings. Due to the high cost and laboriousness of experimental tests, computational approaches for predicting genotoxicity based on chemical structures and properties are recognized as an alternative. In this study, a dataset containing 641 diverse chemicals was collected and the molecules were represented by both fingerprints and molecular descriptors. Then classification models were constructed by six machine learning methods, including the support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (kNN), C4.5 decision tree (DT), random forest (RF) and artificial neural network (ANN). The performance of the models was estimated by five-fold cross-validation and an external validation set. The top ten models showed excellent performance for the external validation with accuracies ranging from 0.846 to 0.938, among which models Pubchem_SVM and MACCS_RF showed a more reliable predictive ability. The applicability domain was also defined to distinguish favorable predictions from unfavorable ones. Finally, ten structural fragments which can be used to assess the genotoxicity potential of a chemical were identified by using information gain and structural fragment frequency analysis. Our models might be helpful for the initial screening of potential genotoxic compounds.

中文翻译：

使用机器学习方法和结构警报在计算机上预测化学遗传毒性†

基因毒性测试可以检测对遗传过程有不利影响的化合物。在体内微核试验是一种遗传毒性试验方法，已广泛用于评估人类染色体损伤的存在和程度。由于实验测试的高成本和费力，因此基于化学结构和性质预测遗传毒性的计算方法被认为是一种替代方法。在这项研究中，收集了包含641种不同化学物质的数据集，并且该分子由指纹和分子描述符表示。然后通过六种机器学习方法构建分类模型，包括支持向量机（SVM），朴素贝叶斯（NB），k近邻（kNN），C4.5决策树（DT），随机森林（RF）和人工神经网络（ANN）。通过五重交叉验证和外部验证集来评估模型的性能。前十个模型显示出优异的外部验证性能，准确度从0.846到0.938，其中Pubchem_SVM和MACCS_RF模型显示出更可靠的预测能力。还定义了适用范围，以区分有利的预测和不利的预测。最后，通过使用信息增益和结构片段频率分析，鉴定了十个可用于评估化学品遗传毒性潜力的结构片段。我们的模型可能有助于初步筛选潜在的遗传毒性化合物。还定义了适用范围，以区分有利的预测和不利的预测。最后，通过使用信息增益和结构片段频率分析，鉴定了十个可用于评估化学品遗传毒性潜力的结构片段。我们的模型可能有助于初步筛选潜在的遗传毒性化合物。还定义了适用范围，以区分有利的预测和不利的预测。最后，通过使用信息增益和结构片段频率分析，鉴定了十个可用于评估化学品遗传毒性潜力的结构片段。我们的模型可能有助于初步筛选潜在的遗传毒性化合物。

更新日期：2017-12-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>