当前位置: X-MOL 学术SAR QSAR Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
QSPR models for bioconcentration factor (BCF): are they able to predict data of industrial interest?
SAR and QSAR in Environmental Research ( IF 3 ) Pub Date : 2019-06-27 , DOI: 10.1080/1062936x.2019.1626278
F Lunghini 1, 2 , G Marcou 1 , P Azam 2 , R Patoux 2 , M H Enrici 2 , F Bonachera 1 , D Horvath 1 , A Varnek 1
Affiliation  

The bioconcentration factor (BCF), a key parameter required by the REACH regulation, estimates the tendency for a xenobiotic to concentrate inside living organisms. In silico methods can be valid alternatives to costly data measurements. However, in the industrial context, these theoretical approaches may fail to predict BCF with reasonable accuracy. We analyzed whether models built on public data only have adequate performances when challenged to predict industrial compounds. A new set of 1129 compounds has been collected by merging publicly available datasets. Generative Topographic Mapping was employed to compare this chemical space with a set of new compounds issued from the industry. Some new chemotypes absent in the training set (such as siloxanes) have been detected. A new BCF model has been built using ISIDA (In SIlico design and Data Analysis) fragment descriptors, support vector regression and random forest machine-learning methods. It has been externally validated on: (i) collected data from the literature and (ii) industrial data. The latter also served as benchmark for the freely available tools VEGA, EPISuite, TEST, OPERA. New model performs (RMSE of 0.58 log BCF units) comparably to existing ones but benefits of an extended applicability, covering the industrial set chemical space (78% data coverage).



中文翻译:

生物浓缩因子(BCF)的QSPR模型:它们是否能够预测工业感兴趣的数据?

生物浓缩因子(BCF)是REACH法规要求的关键参数,它估算了异种生物在生物体内浓缩的趋势。计算机方法可以替代昂贵的数据测量。但是,在工业环境中,这些理论方法可能无法以合理的准确性预测BCF。我们分析了在挑战工业化合物预测时基于公共数据构建的模型是否仅具有足够的性能。通过合并公开可用的数据集,收集了一组新的1129种化合物。生成拓扑图被用来将这个化学空间与工业界发布的一组新化合物进行比较。已检测到训练集中不存在的一些新化学型(例如硅氧烷)。使用ISIDA(在SIlico设计和数据分析中)片段描述符,支持向量回归和随机森林机器学习方法构建了新的BCF模型。已在以下位置进行了外部验证:i)从文献中收集数据,以及(ii)工业数据。后者还用作免费工具VEGA,EPISuite,TEST和OPERA的基准。新模型的性能(RMSE为0.58 log BCF单位)与现有模型相当,但受益于扩展的适用性,涵盖了工业化学空间(数据覆盖率达78%)。

更新日期:2019-06-27
down
wechat
bug