当前位置: X-MOL 学术J. Bioinform. Comput. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Toxicity prediction of small drug molecules of androgen receptor using multilevel ensemble model
Journal of Bioinformatics and Computational Biology ( IF 1 ) Pub Date : 2019-10-14 , DOI: 10.1142/s0219720019500331
Vishan Kumar Gupta 1 , Prashant Singh Rana 1
Affiliation  

In this study, efforts are created to develop a quantitative structure–activity relationship (QSAR)-based model, which are used for the prediction of toxicities to reduce testing in animals, time, and money in the early stages of drug development. An efficient machine learning model is developed to predict the toxicity of those drug molecules which binds to the androgen receptor (AR). Toxicity prediction is performed in terms of their activity, activity score, potency, and efficacy by using various physicochemical properties. A multilevel ensemble model is proposed, where its first level is performed ensemble-based classification of activity, and the second level is performed ensemble-based regression of activity score, potency, and efficacy of only those drug molecules which have been found active during the classification level. The AR dataset has 10,273 drug molecules where 461 are active, and 9812 are inactive, and each drug molecule has 1444 features. Therefore, our dataset is highly imbalanced having a very large number of features. Initially, we performed feature selection then the class imbalance problem is resolved. The [Formula: see text]-fold cross-validation is accomplished to measure the consistency of the model. Finally, our proposed multilevel ensemble model has been validated and compared with some existing models.

中文翻译:

基于多级集成模型的雄激素受体小分子药物毒性预测

在这项研究中,努力开发基于定量构效关系 (QSAR) 的模型,该模型用于预测毒性,以减少药物开发早期阶段的动物试验、时间和金钱。开发了一种有效的机器学习模型来预测那些与雄激素受体 (AR) 结合的药物分子的毒性。通过使用各种物理化学特性,根据它们的活性、活性评分、效力和功效进行毒性预测。提出了一个多级集成模型,其中第一级执行基于集成的活性分类,第二级执行基于集成回归的活性评分、效力和功效,仅对那些在治疗过程中发现有活性的药物分子进行。分类级别。AR 数据集有 10273 个药物分子,其中 461 个处于活性状态,9812 个处于非活性状态,每个药物分子有 1444 个特征。因此,我们的数据集高度不平衡,具有大量特征。最初,我们执行了特征选择,然后解决了类不平衡问题。[公式:见正文]-fold 交叉验证是为了衡量模型的一致性而完成的。最后,我们提出的多级集成模型已经过验证,并与一些现有模型进行了比较。见正文]-fold 交叉验证用于衡量模型的一致性。最后,我们提出的多级集成模型已经过验证,并与一些现有模型进行了比较。见正文]-fold 交叉验证用于衡量模型的一致性。最后,我们提出的多级集成模型已经过验证,并与一些现有模型进行了比较。
更新日期:2019-10-14
down
wechat
bug