当前位置: X-MOL 学术Genome Med. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Text-mining clinically relevant cancer biomarkers for curation into the CIViC database.
Genome Medicine ( IF 12.3 ) Pub Date : 2019-12-03 , DOI: 10.1186/s13073-019-0686-y
Jake Lever 1, 2 , Martin R Jones 1 , Arpad M Danos 3 , Kilannin Krysiak 3, 4 , Melika Bonakdar 1 , Jasleen K Grewal 1, 2 , Luka Culibrk 1, 2 , Obi L Griffith 3, 4, 5, 6 , Malachi Griffith 3, 4, 5, 6 , Steven J M Jones 1, 2, 7
Affiliation  

BACKGROUND Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature. METHODS To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated sentences that discussed biomarkers with their clinical associations and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase. RESULTS We extracted 121,589 relevant sentences from PubMed abstracts and PubMed Central Open Access full-text papers. CIViCmine contains over 87,412 biomarkers associated with 8035 genes, 337 drugs, and 572 cancer types, representing 25,818 abstracts and 39,795 full-text publications. CONCLUSIONS Through integration with CIVIC, we provide a prioritized list of curatable clinically relevant cancer biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general. All data is publically available and distributed with a Creative Commons Zero license. The CIViCmine knowledgebase is available at http://bionlp.bcgsc.ca/civicmine/.

中文翻译:

文本挖掘临床相关的癌症生物标记物,以整理到CIViC数据库中。

背景技术精密肿瘤学涉及对单个癌症样品的分析,以了解与癌症的发展和进展有关的基因和途径。为了改善患者护理,必须具备诊断,预后,易感性和药物反应指标的知识。不同的小组已经创建了几个知识库来整理这些关联的证据。这些内容包括癌症变体的开放式临床解释(CIViC)知识库。这些数据库依靠阅读和解释相关生物医学文献的熟练专家进行的耗时的手动管理。方法为了帮助策划并为这些数据库(尤其是CIViC)提供最大的覆盖范围,我们建议使用文本挖掘方法从所有可用的公开文献中提取这些临床相关的生物标记。为此,一组癌症基因组学专家对句子进行了注释,这些句子与生物标志物及其临床关联进行了讨论,并获得了良好的注释者之间协议。然后,我们使用监督学习的方法来构建CIViCmine知识库。结果我们从PubMed摘要和PubMed Central Open Access全文中提取了121,589个相关句子。CIViCmine包含与8035个基因,337种药物和572种癌症类型相关的87,412多种生物标记,代表25,818篇摘要和39,795篇全文出版物。结论通过与CIVIC的集成,我们提供可治疗的临床相关癌症生物标志物的优先列表,以及对其他知识库和一般癌症精确分析人员有价值的资源。所有数据都是公开可用的,并带有知识共享零许可证。CIViCmine知识库可从http://bionlp.bcgsc.ca/civicmine/获得。
更新日期:2020-04-22
down
wechat
bug