当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
PALMER: improving pathway annotation based on the biomedical literature mining with a constrained latent block model
BMC Bioinformatics ( IF 3 ) Pub Date : 2020-10-02 , DOI: 10.1186/s12859-020-03756-3
Jin Hyun Nam 1, 2 , Daniel Couch 1 , Willian A da Silveira 3 , Zhenning Yu 1 , Dongjun Chung 4
Affiliation  

In systems biology, it is of great interest to identify previously unreported associations between genes. Recently, biomedical literature has been considered as a valuable resource for this purpose. While classical clustering algorithms have popularly been used to investigate associations among genes, they are not tuned for the literature mining data and are also based on strong assumptions, which are often violated in this type of data. For example, these approaches often assume homogeneity and independence among observations. However, these assumptions are often violated due to both redundancies in functional descriptions and biological functions shared among genes. Latent block models can be alternatives in this case but they also often show suboptimal performances, especially when signals are weak. In addition, they do not allow to utilize valuable prior biological knowledge, such as those available in existing databases. In order to address these limitations, here we propose PALMER, a constrained latent block model that allows to identify indirect relationships among genes based on the biomedical literature mining data. By automatically associating relevant Gene Ontology terms, PALMER facilitates biological interpretation of novel findings without laborious downstream analyses. PALMER also allows researchers to utilize prior biological knowledge about known gene-pathway relationships to guide identification of gene–gene associations. We evaluated PALMER with simulation studies and applications to studies of pathway-modulating genes relevant to cancer signaling pathways, while utilizing biological pathway annotations available in the KEGG database as prior knowledge. We showed that PALMER outperforms traditional latent block models and it provides reliable identification of novel gene–gene associations by utilizing prior biological knowledge, especially when signals are weak in the biomedical literature mining dataset. We believe that PALMER and its relevant user-friendly software will be powerful tools that can be used to improve existing pathway annotations and identify novel pathway-modulating genes.

中文翻译:

PALMER:基于生物医学文献挖掘的受限潜在块模型改进路径注释

在系统生物学中,识别以前未报道的基因之间的关联非常有趣。最近,生物医学文献已被认为是用于此目的的宝贵资源。虽然经典的聚类算法已广泛用于研究基因之间的关联,但它们并未针对文献挖掘数据进行调整,并且还基于强假设,这些假设在此类数据中经常被违反。例如,这些方法通常假设观察之间的同质性和独立性。然而,由于功能描述的冗余和基因之间共享的生物学功能,这些假设经常被违反。在这种情况下,潜在块模型可以作为替代方案,但它们也经常表现出次优性能,尤其是在信号较弱时。此外,它们不允许利用有价值的先验生物学知识,例如现有数据库中可用的那些。为了解决这些限制,我们在这里提出了 PALMER,这是一种受约束的潜在块模型,它允许根据生物医学文献挖掘数据识别基因之间的间接关系。通过自动关联相关的基因本体术语,PALMER 促进了对新发现的生物学解释,而无需费力的下游分析。PALMER 还允许研究人员利用关于已知基因通路关系的先前生物学知识来指导基因-基因关联的识别。我们通过模拟研究评估了 PALMER,并将其应用于与癌症信号通路相关的通路调节基因研究,同时利用 KEGG 数据库中可用的生物途径注释作为先验知识。我们证明 PALMER 优于传统的潜在块模型,并且它通过利用先验生物学知识提供了对新基因-基因关联的可靠识别,特别是当生物医学文献挖掘数据集中的信号较弱时。我们相信 PALMER 及其相关的用户友好软件将成为强大的工具,可用于改进现有的通路注释和识别新的通路调节基因。
更新日期:2020-10-02
down
wechat
bug