当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-03-11 , DOI: 10.1186/s12859-020-3344-x
Xin Guan 1, 2 , George Runger 1 , Li Liu 1, 3, 4
Affiliation  

In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection. Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype. Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies. Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R/CRAN archive.

中文翻译:

动态整合生物标记发现中来自多个领域的先验知识。

在生物标记物发现中,应用领域知识是一种有效的方法,可以消除假阳性特征,确定功能上有意义的标记的优先级并促进对预测标记的解释。已经开发了几种计算方法,这些计算方法将基于知识的生物标志物发现公式化为由先验信息指导的特征选择问题。这些方法通常要求将先验信息编码为单个分数,并且针对特定类型的生物学知识对算法进行了优化。但是,实际上,来自各种资源的领域知识可以提供补充信息。但是目前没有方法可以整合用于生物标志物发现的异构先验信息。为了解决这个问题,我们开发了Know-GRRF(知识指导的正规化随机森林)方法,该方法可以动态整合来自多个学科的领域知识来指导特征选择。Know-GRRF将领域知识嵌入规则化的随机森林框架中。它在线性模型中组合了来自多个域的先验信息,以得出综合得分,该得分与其他调整参数一起控制随机森林模型的正则化。Know-GRRF同时优化分配给每种类型的领域知识和其他调整参数的权重,以最大程度地减少袋外预测的AIC。目的是选择一个紧凑的特征子集,该子集具有较高的判别力和与生物表型的强功能相关性。通过严格的模拟,我们表明,由多域先验信息指导的Know-GRRF优于由单域先验信息指导或没有先验信息指导的特征选择方法。然后,我们将Known-GRRF应用于一项现实世界的研究中,以鉴定前列腺癌的预后生物标志物。我们评估了与癌症相关的基因注释,进化保守性和预先计算的统计得分的组合,以此作为组装生物标志物的先验知识。我们发现了一组紧凑的生物标志物,对预测准确性有重大改进。Know-GRRF是一种功能强大的新颖方法,可以将来自多个领域的知识整合起来以进行特征选择。它在生物标志物发现中具有广泛的应用。我们实现了此方法,并在R / CRAN归档文件中发布了KnowGRRF软件包。
更新日期:2020-03-16
down
wechat
bug