当前位置: X-MOL 学术Biol. Direct › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Integration of human cell lines gene expression and chemical properties of drugs for Drug Induced Liver Injury prediction
Biology Direct ( IF 5.5 ) Pub Date : 2021-01-09 , DOI: 10.1186/s13062-020-00286-z
Wojciech Lesiński 1 , Krzysztof Mnich 2 , Agnieszka Kitlas Golińska 1 , Witold R Rudnicki 1, 2
Affiliation  

Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI can bring a significant reduction in the cost of clinical trials. In this work we examined whether occurrence of DILI can be predicted using gene expression profile in cancer cell lines and chemical properties of drugs. We used gene expression profiles from 13 human cell lines, as well as molecular properties of drugs to build Machine Learning models of DILI. To this end, we have used a robust cross-validated protocol based on feature selection and Random Forest algorithm. In this protocol we first identify the most informative variables and then use them to build predictive models. The models are first built using data from single cell lines, and chemical properties. Then they are integrated using Super Learner method with several underlying methods for integration. The entire modelling process is performed using nested cross-validation. We have obtained weakly predictive ML models when using either molecular descriptors, or some individual cell lines (AUC ∈(0.55−0.61)). Models obtained with the Super Learner approach have a significantly improved accuracy (AUC=0.73), which allows to divide substances in two categories: low-risk and high-risk.

中文翻译:

整合人类细胞系基因表达和药物化学特性用于药物诱导肝损伤预测

药物性肝损伤 (DILI) 是药物开发中的主要问题之一。DILI的早期预测可以带来临床试验成本的显着降低。在这项工作中,我们检查了是否可以使用癌细胞系中的基因表达谱和药物的化学特性来预测 DILI 的发生。我们使用来自 13 个人类细胞系的基因表达谱以及药物的分子特性来构建 DILI 的机器学习模型。为此,我们使用了基于特征选择和随机森林算法的强大的交叉验证协议。在这个协议中,我们首先确定信息量最大的变量,然后使用它们来构建预测模型。这些模型首先使用来自单细胞系的数据和化学特性构建。然后使用 Super Learner 方法将它们与几种底层集成方法进行集成。整个建模过程使用嵌套交叉验证来执行。我们在使用分子描述符或某些单个细胞系 (AUC ∈(0.55−0.61)) 时获得了弱预测 ML 模型。使用超级学习器方法获得的模型具有显着提高的准确度 (AUC=0.73),可以将物质分为两类:低风险和高风险。
更新日期:2021-01-10
down
wechat
bug