Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
New clues on carcinogenicity-related substructures derived from mining two large datasets of chemical compounds.
Journal of Environmental Science and Health, Part C ( IF 1.650 ) Pub Date : 2016-03-18 , DOI: 10.1080/10590501.2016.1166879
Azadi Golbamaki 1 , Emilio Benfenati 1 , Nazanin Golbamaki 2 , Alberto Manganaro 1 , Erinc Merdivan 3 , Alessandra Roncaglioni 1 , Giuseppina Gini 4
Affiliation  

In this study, new molecular fragments associated with genotoxic and nongenotoxic carcinogens are introduced to estimate the carcinogenic potential of compounds. Two rule-based carcinogenesis models were developed with the aid of SARpy: model R (from rodents' experimental data) and model E (from human carcinogenicity data). Structural alert extraction method of SARpy uses a completely automated and unbiased manner with statistical significance. The carcinogenicity models developed in this study are collections of carcinogenic potential fragments that were extracted from two carcinogenicity databases: the ANTARES carcinogenicity dataset with information from bioassay on rats and the combination of ISSCAN and CGX datasets, which take into accounts human-based assessment. The performance of these two models was evaluated in terms of cross-validation and external validation using a 258 compound case study dataset. Combining R and H predictions and scoring a positive or negative result when both models are concordant on a prediction, increased accuracy to 72% and specificity to 79% on the external test set. The carcinogenic fragments present in the two models were compared and analyzed from the point of view of chemical class. The results of this study show that the developed rule sets will be a useful tool to identify some new structural alerts of carcinogenicity and provide effective information on the molecular structures of carcinogenic chemicals.

中文翻译:

通过挖掘两个大型化学化合物数据集而得出的与致癌性有关的亚结构的新线索。

在这项研究中,引入了与遗传毒性和非遗传毒性致癌物相关的新分子片段,以评估化合物的致癌潜力。借助SARpy,开发了两个基于规则的致癌模型:模型R(来自啮齿动物的实验数据)和模型E(来自人类致癌性数据)。SARpy的结构警报提取方法采用了一种完全自动化且无偏见的方法,具有统计意义。本研究开发的致癌模型是从两个致癌数据库中提取的致癌潜在片段的集合:ANTARES致癌数据集(来自大鼠的生物测定信息)以及ISSCAN和CGX数据集的组合,其中考虑了基于人的评估。使用258个复合案例研究数据集,在交叉验证和外部验证方面评估了这两个模型的性能。当两个模型都与预测一致时,将R和H预测结合并在阳性或阴性结果中评分,在外部测试集上将准确性提高到72%,将特异性提高到79%。从化学类别的角度对两个模型中存在的致癌片段进行了比较和分析。这项研究的结果表明,制定的规则集将是识别某些新的致癌性结构警报并提供有关致癌化学物质分子结构的有效信息的有用工具。当两个模型都与预测一致时,将R和H预测结合并在阳性或阴性结果中评分,在外部测试集上将准确性提高到72%,将特异性提高到79%。从化学类别的角度对两个模型中存在的致癌片段进行了比较和分析。这项研究的结果表明,制定的规则集将是识别某些新的致癌性结构警报并提供有关致癌化学物质分子结构的有效信息的有用工具。当两个模型都与预测一致时,将R和H预测结合并在阳性或阴性结果中评分,在外部测试集上将准确性提高到72%,将特异性提高到79%。从化学类别的角度对两个模型中存在的致癌片段进行了比较和分析。这项研究的结果表明,制定的规则集将是识别某些新的致癌性结构警报并提供有关致癌化学物质分子结构的有效信息的有用工具。
更新日期:2019-11-01
down
wechat
bug