当前位置: X-MOL 学术BMC Bioinform. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms.
BMC Bioinformatics ( IF 2.9 ) Pub Date : 2020-06-18 , DOI: 10.1186/s12859-020-03583-6
Zhi-Hui Luo 1, 2 , Meng-Wei Shi 1, 2 , Zhuang Yang 1, 2 , Hong-Yu Zhang 3 , Zhen-Xia Chen 1, 2
Affiliation  

Many disease causing genes have been identified through different methods, but there have been no uniform annotations of biomedical named entity (bio-NE) of the disease phenotypes of these genes yet. Furthermore, semantic similarity comparison between two bio-NE annotations has become important for data integration or system genetics analysis. The package pyMeSHSim recognizes bio-NEs by using MetaMap which produces Unified Medical Language System (UMLS) concepts in natural language process. To map the UMLS concepts to Medical Subject Headings (MeSH), pyMeSHSim is embedded with a house-made dataset containing the main headings (MHs), supplementary concept records (SCRs), and their relations in MeSH. Based on the dataset, pyMeSHSim implemented four information content (IC)-based algorithms and one graph-based algorithm to measure the semantic similarity between two MeSH terms. To evaluate its performance, we used pyMeSHSim to parse OMIM and GWAS phenotypes. The pyMeSHSim introduced SCRs and the curation strategy of non-MeSH-synonymous UMLS concepts, which improved the performance of pyMeSHSim in the recognition of OMIM phenotypes. In the curation of 461 GWAS phenotypes, pyMeSHSim showed recall > 0.94, precision > 0.56, and F1 > 0.70, demonstrating better performance than the state-of-the-art tools DNorm and TaggerOne in recognizing MeSH terms from short biomedical phrases. The semantic similarity in MeSH terms recognized by pyMeSHSim and the previous manual work was calculated by pyMeSHSim and another semantic analysis tool meshes, respectively. The result indicated that the correlation of semantic similarity analysed by two tools reached as high as 0.89–0.99. The integrative MeSH tool pyMeSHSim embedded with the MeSH MHs and SCRs realized the bio-NE recognition, normalization, and comparison in biomedical text-mining.

中文翻译:

pyMeSHSim:用于生物医学命名实体识别,标准化和MeSH术语比较的集成python软件包。

已经通过不同的方法鉴定了许多致病基因,但是还没有统一的注释来说明这些基因的疾病表型的生物医学命名实体(bio-NE)。此外,两个生物NE注释之间的语义相似性比较对于数据集成或系统遗传学分析已经变得很重要。pyMeSHSim软件包通过使用MetaMap识别生物NE,该MetaMap以自然语言过程生成统一的医学语言系统(UMLS)概念。为了将UMLS概念映射到医学主题标题(MeSH),pyMeSHSim嵌入了一个内部数据集,其中包含主要标题(MH),补充概念记录(SCR)及其在MeSH中的关系。根据数据集,pyMeSHSim实现了四种基于信息内容(IC)的算法和一种基于图的算法,以测量两个MeSH词之间的语义相似性。为了评估其性能,我们使用pyMeSHSim解析OMIM和GWAS表型。pyMeSHSim引入了SCR和非MeSH同义UMLS概念的策划策略,从而提高了pyMeSHSim在识别OMIM表型方面的性能。在461个GWAS表型的整理中,pyMeSHSim显示召回率> 0.94,精度> 0.56和F1> 0.70,在从短的生物医学短语中识别MeSH术语方面表现出比最新工具DNorm和TaggerOne更好的性能。pyMeSHSim和其他语义分析工具网格分别计算了pyMeSHSim和先前的人工著作在MeSH术语中的语义相似性。结果表明,两种工具对语义相似度的相关性高达0.89〜0.99。嵌入MeSH MH和SCR的集成式MeSH工具pyMeSHSim实现了生物医学文本挖掘中的bio-NE识别,标准化和比较。
更新日期:2020-06-18
down
wechat
bug