当前位置: X-MOL 学术Secur. Commun. Netw. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Recognition of Disease Genetic Information from Unstructured Text Data Based on BiLSTM-CRF for Molecular Mechanisms
Security and Communication Networks ( IF 1.968 ) Pub Date : 2021-02-19 , DOI: 10.1155/2021/6635027
Lejun Gong 1, 2 , Xingxing Zhang 1 , Tianyin Chen 1 , Li Zhang 3
Affiliation  

Disease relevant entities are an important task in mining unstructured text data from the biomedical literature for achieving biomedical knowledge. Autism spectrum disorder (ASD) is a disease related to a neurological and developmental disorder characterized by deficits in communication and social interaction and by repetitive behaviour. However, this kind of disease remains unclear to date. In this study, it identifies entities associated with disease using the machine learning of a computational way from text data collection for molecular mechanisms related to ASD. Entities related to disease are extracted from the biomedical literature related to autism by using deep learning with bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) model. Compared other previous works, the approach is promising for identifying entities related to disease. The proposed approach including five types of molecular entities is evaluated by GENIA corpus to obtain an F-score of 76.81%. The work has extracted 9146 proteins, 145 RNAs, 7680 DNAs, 1058 cell-types, and 981 cell-lines from the autism biomedical literature after removing repeated molecular entities. Finally, we perform GO and KEGG analyses of the test dataset. This study could serve as a reference for further studies on the etiology of disease on the basis of molecular mechanisms and provide a way to explore disease genetic information.

中文翻译:

基于BiLSTM-CRF的非结构化文本数据中疾病遗传信息的分子机理识别

与疾病相关的实体是从生物医学文献中获取非结构化文本数据以获取生物医学知识的重要任务。自闭症谱系障碍(ASD)是一种与神经和发育障碍有关的疾病,其特征是沟通和社交互动不足以及行为重复。但是,迄今为止尚不清楚这种疾病。在这项研究中,它使用机器学习方法通​​过从文本数据收集中获取与ASD相关的分子机制的计算方法,来识别与疾病相关的实体。通过使用具有双向长短期记忆(BiLSTM)和条件随机场(CRF)模型的深度学习,从与自闭症相关的生物医学文献中提取与疾病相关的实体。与之前的其他作品相比,该方法有望用于识别与疾病相关的实体。GENIA语料库评估了所提出的包括五种分子实体类型的方法,得出F分数为76.81%。在去除重复的分子实体后,这项工作从自闭症生物医学文献中提取了9146种蛋白质,145种RNA,7680种DNA,1058种细胞类型和981种细胞系。最后,我们对测试数据集进行GO和KEGG分析。这项研究可以为在分子机制基础上进一步研究疾病病因提供参考,并为探索疾病的遗传信息提供了一种途径。去除重复的分子实体后,来自自闭症生物医学文献的981个细胞系。最后,我们对测试数据集进行GO和KEGG分析。这项研究可以为在分子机制基础上进一步研究疾病病因提供参考,并为探索疾病的遗传信息提供了一种途径。去除重复的分子实体后,来自自闭症生物医学文献的981个细胞系。最后,我们对测试数据集进行GO和KEGG分析。这项研究可以为在分子机制基础上进一步研究疾病病因提供参考,并为探索疾病的遗传信息提供了一种途径。
更新日期:2021-02-19
down
wechat
bug