当前位置: X-MOL 学术Sci. Data › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature
Scientific Data ( IF 5.8 ) Pub Date : 2021-03-25 , DOI: 10.1038/s41597-021-00875-1
Rezarta Islamaj 1 , Robert Leaman 1 , Sun Kim 1 , Dongseop Kwon 1 , Chih-Hsuan Wei 1 , Donald C Comeau 1 , Yifan Peng 1 , David Cissel 1 , Cathleen Coss 1 , Carol Fisher 1 , Rob Guzman 1 , Preeti Gokal Kochar 1 , Stella Koppel 1 , Dorothy Trinh 1 , Keiko Sekiya 1 , Janice Ward 1 , Deborah Whitman 1 , Susan Schmidt 1 , Zhiyong Lu 1
Affiliation  

Automatically identifying chemical and drug names in scientific publications advances information access for this important class of entities in a variety of biomedical disciplines by enabling improved retrieval and linkage to related concepts. While current methods for tagging chemical entities were developed for the article title and abstract, their performance in the full article text is substantially lower. However, the full text frequently contains more detailed chemical information, such as the properties of chemical compounds, their biological effects and interactions with diseases, genes and other chemicals. We therefore present the NLM-Chem corpus, a full-text resource to support the development and evaluation of automated chemical entity taggers. The NLM-Chem corpus consists of 150 full-text articles, doubly annotated by ten expert NLM indexers, with ~5000 unique chemical name annotations, mapped to ~2000 MeSH identifiers. We also describe a substantially improved chemical entity tagger, with automated annotations for all of PubMed and PMC freely accessible through the PubTator web-based interface and API. The NLM-Chem corpus is freely available.



中文翻译:


NLM-Chem,PubMed 全文文献中化学实体识别的新资源



自动识别科学出版物中的化学和药物名称,通过改进相关概念的检索和链接,可以促进各种生物医学学科中这一类重要实体的信息访问。虽然当前标记化学实体的方法是针对文章标题和摘要开发的,但它们在全文文本中的性能要低得多。然而,全文通常包含更详细的化学信息,例如化合物的特性、其生物效应以及与疾病、基因和其他化学物质的相互作用。因此,我们提出了 NLM-Chem 语料库,这是一个支持自动化化学实体标记器的开发和评估的全文资源。 NLM-Chem 语料库包含 150 篇全文文章,由 10 位 NLM 索引专家进行双重注释,具有约 5000 个独特的化学名称注释,映射到约 2000 个 MeSH 标识符。我们还描述了一种经过大幅改进的化学实体标记器,可通过 PubTator 基于 Web 的界面和 API 自由访问所有 PubMed 和 PMC 的自动注释。 NLM-Chem 语料库可免费使用。

更新日期:2021-03-25
down
wechat
bug