当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Machine learning prediction of oncology drug targets based on protein and network properties.
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-03-14 , DOI: 10.1186/s12859-020-3442-9
Zoltán Dezső 1 , Michele Ceccarelli 1, 2, 3
Affiliation  

BACKGROUND The selection and prioritization of drug targets is a central problem in drug discovery. Computational approaches can leverage the growing number of large-scale human genomics and proteomics data to make in-silico target identification, reducing the cost and the time needed. RESULTS We developed a machine learning approach to score proteins to generate a druggability score of novel targets. In our model we incorporated 70 protein features which included properties derived from the sequence, features characterizing protein functions as well as network properties derived from the protein-protein interaction network. The advantage of this approach is that it is unbiased and even less studied proteins with limited information about their function can score well as most of the features are independent of the accumulated literature. We build models on a training set which consist of targets with approved drugs and a negative set of non-drug targets. The machine learning techniques help to identify the most important combination of features differentiating validated targets from non-targets. We validated our predictions on an independent set of clinical trial drug targets, achieving a high accuracy characterized by an Area Under the Curve (AUC) of 0.89. Our most predictive features included biological function of proteins, network centrality measures, protein essentiality, tissue specificity, localization and solvent accessibility. Our predictions, based on a small set of 102 validated oncology targets, recovered the majority of known drug targets and identifies a novel set of proteins as drug target candidates. CONCLUSIONS We developed a machine learning approach to prioritize proteins according to their similarity to approved drug targets. We have shown that the method proposed is highly predictive on a validation dataset consisting of 277 targets of clinical trial drug confirming that our computational approach is an efficient and cost-effective tool for drug target discovery and prioritization. Our predictions were based on oncology targets and cancer relevant biological functions, resulting in significantly higher scores for targets of oncology clinical trial drugs compared to the scores of targets of trial drugs for other indications. Our approach can be used to make indication specific drug-target prediction by combining generic druggability features with indication specific biological functions.

中文翻译:

基于蛋白质和网络特性的肿瘤药物靶标的机器学习预测。

背景技术药物靶标的选择和优先次序是药物发现中的中心问题。计算方法可以利用越来越多的大规模人类基因组学和蛋白质组学数据进行硅靶标识别,从而降低成本和所需时间。结果我们开发了一种机器学习方法来对蛋白质进行评分,以产生新靶标的可药物性评分。在我们的模型中,我们纳入了70个蛋白质特征,其中包括源自序列的特性,表征蛋白质功能的特征以及源自蛋白质-蛋白质相互作用网络的网络特性。这种方法的优势在于它是无偏见的,即使功能很少的蛋白质研究也很少,但它们的大多数特征都独立于所积累的文献,因此得分很高。我们在训练集上建立模型,该训练集包括具有批准药物的目标和否定非药物目标的集合。机器学习技术有助于识别将已验证目标与非目标区分开来的功能的最重要组合。我们通过独立的一组临床试验药物靶标验证了我们的预测,实现了以曲线下面积(AUC)为0.89为特征的高精度。我们最可预测的功能包括蛋白质的生物学功能,网络集中度测量,蛋白质必需性,组织特异性,定位和溶剂可及性。我们的预测基于一小组102个经过验证的肿瘤学靶标,回收了大多数已知的药物靶标,并确定了一组新的蛋白质作为候选药物靶标。结论我们开发了一种机器学习方法,可根据蛋白质与已批准药物靶标的相似性对蛋白质进行优先级排序。我们已经表明,所提出的方法在由277个临床试验药物靶标组成的验证数据集上具有高度预测性,证实了我们的计算方法是一种有效且具有成本效益的工具,可用于发现药物靶标和确定优先次序。我们的预测是基于肿瘤学目标和癌症相关的生物学功能,因此与其他适应症相比,肿瘤学临床试验药物的目标得分明显更高。通过将通用的可药物性特征与适应症的生物学功能相结合,我们的方法可用于进行适应症的药物靶标预测。
更新日期:2020-04-22
down
wechat
bug