当前位置: X-MOL 学术bioRxiv. Syst. Biol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
NERO: A Biomedical Named-entity (Recognition) Ontology with a Large, Annotated Corpus Reveals Meaningful Associations Through Text Embedding
bioRxiv - Systems Biology Pub Date : 2020-11-06 , DOI: 10.1101/2020.11.05.368969
Kanix Wang , Robert Stevens , Halima Alachram , Yu Li , Larisa Soldatova , Ross King , Sophia Ananiadou , Maolin Li , Fenia Christopoulou , Jose Luis Ambite , Sahil Garg , Ulf Hermjakob , Daniel Marcu , Emily Sheng , Tim Beißbarth , Edgar Wingender , Aram Galstyan , Xin Gao , Brendan Chambers , Bohdan B. Khomtchouk , James A. Evans , Andrey Rzhetsky

Machine reading is essential for unlocking valuable knowledge contained in the millions of existing biomedical documents. Over the last two decades, the most dramatic advances in machine reading have followed in the wake of critical corpus development. Large, well-annotated corpora have been associated with punctuated advances in machine reading methodology and automated knowledge extraction systems in the same way that ImageNet was fundamental for developing computer vision techniques. This study contributes six components to an advanced, named-entity analysis tool for biomedicine: (a) a new, Named-Entity Recognition Ontology (NERO) developed specifically for describing entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named-entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named-entity recognition automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

中文翻译:

NERO:具有大型批注语料库的生物医学命名实体(识别)本体通过文本嵌入揭示有意义的关联

机器读取对于解锁数百万种现有生物医学文档中包含的宝贵知识至关重要。在过去的二十年中,随着关键语料库的发展,机器阅读领域取得了最显着的进步。大型,注解丰富的语料库与机器阅读方法和自动知识提取系统的突破性进展相关联,就像ImageNet是开发计算机视觉技术的基础一样。这项研究为生物医学的高级命名实体分析工具贡献了六个组成部分:(a)专为描述生物医学文本中的实体而开发的新的命名实体识别本体论(NERO),它解决了歧义的各种程度,弥合了分子生物学,遗传学,生物化学和医学的科学子语言;(b)为人类专家注释数百种命名实体类的详细指南;(c)所有命名实体的象形文字,以减轻策展人注释的负担;(d)由35,865个句子组成的原始带注释的语料库,其中封装了190,679个命名实体和连接两个或多个实体的43,438个事件;(e)经过验证的现成的命名实体识别自动提取,以及;(f)嵌入模型,以证明嵌入在该语料库中的生物医学协会的前景。(e)经过验证的现成的命名实体识别自动提取,以及;(f)嵌入模型,以证明嵌入在该语料库中的生物医学协会的前景。(e)经过验证的现成的命名实体识别自动提取,以及 (f)嵌入模型,以证明嵌入在该语料库中的生物医学协会的前景。
更新日期:2020-11-09
down
wechat
bug