当前位置: X-MOL 学术Chem. Res. Toxicol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Combining In Vivo Data with In Silico Predictions for Modeling Hepatic Steatosis by Using Stratified Bagging and Conformal Prediction
Chemical Research in Toxicology ( IF 3.7 ) Pub Date : 2020-12-21 , DOI: 10.1021/acs.chemrestox.0c00511
Sankalp Jain 1 , Ulf Norinder 2 , Sylvia E Escher 3 , Barbara Zdrazil 1
Affiliation  

Hepatic steatosis (fatty liver) is a severe liver disease induced by the excessive accumulation of fatty acids in hepatocytes. In this study, we developed reliable in silico models for predicting hepatic steatosis on the basis of an in vivo data set of 1041 compounds measured in rodent studies with repeated oral exposure. The imbalanced nature of the data set (1:8, with the “steatotic” compounds belonging to the minority class) required the use of meta-classifiers—bagging with stratified under-sampling and Mondrian conformal prediction—on top of the base classifier random forest. One major goal was the investigation of the influence of different descriptor combinations on model performance (tested by predicting an external validation set): physicochemical descriptors (RDKit), ToxPrint features, as well as predictions from in silico nuclear receptor and transporter models. All models based upon descriptor combinations including physicochemical features led to reasonable balanced accuracies (BAs between 0.65 and 0.69 for the respective models). Combining physicochemical features with transporter predictions and further with ToxPrint features gave the best performing model (BAs up to 0.7 and efficiencies of 0.82). Whereas both meta-classifiers proved useful for this highly imbalanced toxicity data set, the conformal prediction framework also guarantees the error level and thus might be favored for future studies in the field of predictive toxicology.

中文翻译:

将体内数据与计算机模拟预测相结合,使用分层装袋和共形预测对肝脏脂肪变性进行建模

肝脂肪变性(脂肪肝)是一种严重的肝脏疾病,由肝细胞中脂肪酸的过度积累引起。在这项研究中,我们开发了可靠的计算机模型,用于基于体内预测肝脂肪变性在啮齿动物研究中测量的 1041 种化合物的数据集,重复口服暴露。数据集的不平衡性质(1:8,“脂肪性”化合物属于少数类)需要使用元分类器——分层欠采样和蒙德里安保形预测——在基本分类器随机之上森林。一个主要目标是研究不同描述符组合对模型性能的影响(通过预测外部验证集进行测试):物理化学描述符 (RDKit)、ToxPrint 特征以及来自计算机的预测核受体和转运蛋白模型。基于包括物理化学特征的描述符组合的所有模型都导致了合理的平衡精度(相应模型的 BA 在 0.65 和 0.69 之间)。将物理化学特征与转运蛋白预测相结合,并进一步与 ToxPrint 特征相结合,得到了性能最佳的模型(BA 高达 0.7,效率为 0.82)。虽然这两个元分类器都被证明对这种高度不平衡的毒性数据集有用,但适形预测框架也保证了误差水平,因此可能有利于预测毒理学领域的未来研究。
更新日期:2021-02-15
down
wechat
bug