当前位置: X-MOL 学术SAR QSAR Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Modelling of ready biodegradability based on combined public and industrial data sources.
SAR and QSAR in Environmental Research ( IF 3 ) Pub Date : 2019-12-20 , DOI: 10.1080/1062936x.2019.1697360
F Lunghini 1, 2 , G Marcou 1 , P Gantzer 1 , P Azam 2 , D Horvath 1 , E Van Miert 2 , A Varnek 1
Affiliation  

The European Registration, Evaluation, Authorization and Restriction of Chemical Substances Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB), considering in silico prediction as valid alternative to experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues. In this work, we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (Balance Accuracy (BA) = 0.74–0.79) and data coverage (83–91%). The Generative Topographic Mapping approach identified several chemotypes and structural motifs unique to the industrial dataset, highlighting for which chemical classes currently available models may have less reliable predictions. Finally, public and industrial data were merged into global dataset containing 3146 compounds. This is the biggest dataset reported in the literature so far, covering some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has larger applicability domain than the existing ones.



中文翻译:

基于公共和工业数据源的现成生物可降解性建模。

《欧洲化学物质注册,评估,授权和限制规定》要求对市售化学品进行就绪生物降解性(RB)评估,并考虑计算机模拟作为实验测试的有效替代方法。但是,由于准确性和适用性域限制问题,当前可用的模型可能与预测工业感兴趣的化合物无关。在这项工作中,我们提出了一个新的和扩展的RB数据集(2830种化合物),该数据集是通过合并多个公共数据源而发布的。它用于训练分类模型,该模型在工业上来自316种化合物的外部验证和基准,已针对已经存在的工具进行了基准测试。新模型在预测能力方面显示出良好的性能(平衡精度(BA)= 0.74-0。79)和数据覆盖率(83-91%)。生成式地形图绘制方法确定了工业数据集特有的几种化学型和结构基序,突出了当前可用模型对哪些化学类别的预测可能较不可靠。最后,将公共和工业数据合并到包含3146种化合物的全球数据集中。这是迄今为止文献中报道的最大数据集,涵盖了公共数据中缺少的一些化学型。因此,在全局数据集上开发的预测模型具有比现有模型更大的适用范围。公共和工业数据被合并到包含3146种化合物的全球数据集中。这是迄今为止文献中报道的最大数据集,涵盖了公共数据中缺少的一些化学型。因此,在全局数据集上开发的预测模型具有比现有模型更大的适用范围。公共和工业数据合并到包含3146种化合物的全球数据集中。这是迄今为止文献中报道的最大数据集,涵盖了公共数据中缺少的一些化学型。因此,在全局数据集上开发的预测模型具有比现有模型更大的适用范围。

更新日期:2020-03-20
down
wechat
bug