当前位置: X-MOL 学术SAR QSAR Environ. Res. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Consensus models to predict oral rat acute toxicity and validation on a dataset coming from the industrial context.
SAR and QSAR in Environmental Research ( IF 3 ) Pub Date : 2019-10-14 , DOI: 10.1080/1062936x.2019.1672089
F Lunghini 1, 2 , G Marcou 1 , P Azam 2 , D Horvath 1 , R Patoux 2 , E Van Miert 2 , A Varnek 1
Affiliation  

We report predictive models of acute oral systemic toxicity representing a follow-up of our previous work in the framework of the NICEATM project. It includes the update of original models through the addition of new data and an external validation of the models using a dataset relevant for the chemical industry context. A regression model for LD50 and multi-class classification model for toxicity classes according to the Global Harmonized System categories were prepared. ISIDA descriptors were used to encode molecular structures. Machine learning algorithms included support vector machine (SVM), random forest (RF) and naïve Bayesian. Selected individual models were combined in consensus. The different datasets were compared using the generative topographic mapping approach. It appeared that the NICEATM datasets were lacking some relevant chemotypes for chemical industry. The new models trained on enlarged data sets have applicability domains (AD) sufficiently large to accommodate industrial compounds. The fraction of compounds inside the models’ AD increased from 58% (NICEATM model) to 94% (new model). The increase of training sets improved models’ prediction performance: RMSE values decreased from 0.56 to 0.47 and balanced accuracies increased from 0.69 to 0.71 for NICEATM and new models, respectively.



中文翻译:

共识模型可预测口服大鼠的急性毒性和对来自工业环境的数据集进行验证。

我们报告了急性口服全身毒性的预测模型,代表了我们在NICEATM项目框架内的先前工作的后续行动。它包括通过添加新数据来更新原始模型,以及使用与化工行业相关的数据集对模型进行外部验证。LD 50的回归模型建立了根据全球协调制度分类的毒性分类的多分类模型。ISIDA描述符用于编码分子结构。机器学习算法包括支持向量机(SVM),随机森林(RF)和朴素贝叶斯算法。选定的单个模型以共识方式组合。使用生成的地形图方法比较了不同的数据集。看来NICEATM数据集缺少一些与化学工业相关的化学型。在扩大的数据集上训练的新模型具有足够大的适用范围(AD),可以容纳工业化合物。模型AD中化合物的比例从58%(NICEATM模型)增加到94%(新模型)。训练集的增加改善了模型的预测性能:

更新日期:2019-10-14
down
wechat
bug