A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience.,Neuroinformatics

当前位置： X-MOL 学术 › Neuroinformatics › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A Text Mining Pipeline Using Active and Deep Learning Aimed at Curating Information in Computational Neuroscience.
Neuroinformatics ( IF 3 ) Pub Date : 2018-11-15 , DOI: 10.1007/s12021-018-9404-y
Matthew Shardlow ₁ , Meizhi Ju ₁ , Maolin Li ₁ , Christian O'Reilly ₂ , Elisabetta Iavarone ₂ , John McNaught ₁ , Sophia Ananiadou ₁

Affiliation

The curation of neuroscience entities is crucial to ongoing efforts in neuroinformatics and computational neuroscience, such as those being deployed in the context of continuing large-scale brain modelling projects. However, manually sifting through thousands of articles for new information about modelled entities is a painstaking and low-reward task. Text mining can be used to help a curator extract relevant information from this literature in a systematic way. We propose the application of text mining methods for the neuroscience literature. Specifically, two computational neuroscientists annotated a corpus of entities pertinent to neuroscience using active learning techniques to enable swift, targeted annotation. We then trained machine learning models to recognise the entities that have been identified. The entities covered are Neuron Types, Brain Regions, Experimental Values, Units, Ion Currents, Channels, and Conductances and Model organisms. We tested a traditional rule-based approach, a conditional random field and a model using deep learning named entity recognition, finding that the deep learning model was superior. Our final results show that we can detect a range of named entities of interest to the neuroscientist with a macro average precision, recall and F1 score of 0.866, 0.817 and 0.837 respectively. The contributions of this work are as follows: 1) We provide a set of Named Entity Recognition (NER) tools that are capable of detecting neuroscience entities with performance above or similar to prior work. 2) We propose a methodology for training NER tools for neuroscience that requires very little training data to get strong performance. This can be adapted for any sub-domain within neuroscience. 3) We provide a small corpus with annotations for multiple entity types, as well as annotation guidelines to help others reproduce our experiments.

中文翻译：

使用主动和深度学习的文本挖掘管道，旨在在计算神经科学中处理信息。

神经科学实体的策划对于神经信息学和计算神经科学的持续努力至关重要，例如在持续的大规模脑建模项目中部署的那些。但是，手动筛选成千上万的文章以获取有关建模实体的新信息是一项艰巨且低回报的任务。文本挖掘可用于帮助策展人以系统的方式从该文献中提取相关信息。我们提出文本挖掘方法在神经科学文献中的应用。具体来说，两名计算神经科学家使用主动学习技术对一整套与神经科学有关的实体进行注释，以实现快速，有针对性的注释。然后，我们训练了机器学习模型来识别已识别的实体。涵盖的实体是神经元类型，脑区，实验值，单位，离子流，通道，电导率和模型生物。我们测试了传统的基于规则的方法，条件随机字段和使用深度学习的名为实体识别的模型，发现深度学习模型是优越的。我们的最终结果表明，我们可以检测到神经科学家感兴趣的一系列命名实体，其宏观平均精度，召回率和F1分数分别为0.866、0.817和0.837。这项工作的贡献如下：1）我们提供了一组命名实体识别（NER）工具，它们能够检测性能优于或类似于先前工作的神经科学实体。2）我们提出了一种用于训练神经科学NER工具的方法，该方法只需很少的训练数据即可获得出色的性能。这可以适用于神经科学中的任何子域。3）我们为小型语料库提供了多种实体类型的注释，以及注释准则，以帮助其他人重现我们的实验。

更新日期：2018-11-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>