Abstract
Understanding the complex connectivity structure of the brain is a major challenge in neuroscience. Vast and ever-expanding literature about neuronal connectivity between brain regions already exists in published research articles and databases. However, with the ever-expanding increase in published articles and repositories, it becomes difficult for a neuroscientist to engage with the breadth and depth of any given field within neuroscience. Natural Language Processing (NLP) techniques can be used to mine ‘Brain Region Connectivity’ information from published articles to build a centralized connectivity resource helping neuroscience researchers to gain quick access to research findings. Manually curating and continuously updating such a resource involves significant time and effort. This paper presents an application of supervised machine learning algorithms that perform shallow and deep linguistic analysis of text to automatically extract connectivity between brain region mentions. Our proposed algorithms are evaluated using benchmark datasets collated from PubMed and our own dataset of full text articles annotated by a domain expert. We also present a comparison with state-of-the-art methods including BioBERT. Proposed methods achieve best recall and \(F_2\) scores negating the need for any domain-specific predefined linguistic patterns. Our paper presents a novel effort towards automatically generating interpretable patterns of connectivity for extracting connected brain region mentions from text and can be expanded to include any other domain-specific information.
Graphic Abstract
Similar content being viewed by others
Data Availability
Models generated on the benchmark corpus WhiteText, Supplementary material including the datasets are made available at https://github.com/ashika55/BRConnExt.
References
Canese K, Weis S (2013) Pubmed: the bibliographic database. In: The NCBI Handbook [internet]. 2nd edition, National Center for Biotechnology Information (US), https://www.ncbi.nlm.nih.gov/sites/books/NBK153385/
Sporns O (2011) The human connectome: a complex network. Annals of the New York Academy of Sciences 1224(1):109–125. https://doi.org/10.1016/S0920-9964(12)70100-7
Richardet R, Chappelier JC, Telefont M, Hill S (2015) Large-scale extraction of brain connectivity from the neuroscientific literature. Bioinformatics 31(10):1640–1647. https://doi.org/10.1093/bioinformatics/btv025
French L, Lane S, Xu L, Siu C, Kwok C, Chen Y, Krebs C, Pavlidis P (2012) Application and evaluation of automated methods to extract neuroanatomical connectivity statements from free text. Bioinformatics 28(22):2963–2970. https://doi.org/10.1093/bioinformatics/bts542
French L, Liu P, Marais O, Koreman T, Tseng L, Lai A, Pavlidis P (2015) Text mining for neuroanatomy using whitetext with an updated corpus and a new web application. Front Neuroinform 9:13. https://doi.org/10.3389/fninf.2015.00013
Künzle H, Radtke-Schuller S (2000) Basal telencephalic regions connected with the olfactory bulb in a madagascan hedgehog tenrec. J Comparative Neurol 423(4):706–726
Hobbs JR (2002) Information extraction from biomedical text. J Biomed Inform 35(4):260–264. https://doi.org/10.1016/S1532-0464(03)00015-7
Tikk D, Thomas P, Palaga P, Hakenberg J, Leser U (2010) A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature. PLoS Comput Biol 6(7):e1000837. https://doi.org/10.1371/journal.pcbi.1000837
Wu HY, Chiang CW, Li L (2014) Text mining for drug–drug interaction. In: Biomedical Literature Mining, Springer, pp 47–75, 10.1007/978-1-4939-0709-0\_4
French L, Lane S, Xu L, Pavlidis P (2009) Automated recognition of brain region mentions in neuroscience literature. Front Neuroinform 3:29. https://doi.org/10.3389/neuro.11.029.2009
Giuliano C, Lavelli A, Romano L (2006) Exploiting shallow linguistic information for relation extraction from biomedical literature. In: 11th Conference of the European Chapter of the Association for Computational Linguistics, https://www.aclweb.org/anthology/E06-1051
Kluegl P, Toepfer M, Beck PD, Fette G, Puppe F (2016) Uima ruta: rapid development of rule-based information extraction applications. Nat Lang Eng 22(1):1–40. https://doi.org/10.1017/S1351324914000114
Gökdeniz E, Özgür A, Canbeyli R (2016) Automated neuroanatomical relation extraction: a linguistically motivated approach with a pvt connectivity graph case study. Front Neuroinform 10:39. https://doi.org/10.3389/fninf.2016.00039
Künzle H (1998) Thalamic territories innervated by cerebellar nuclear afferents in the hedgehog tenrec, echinops telfairi. J Comparative Neurol 402(3):313–326. 10.1002/(SICI)1096-9861(19981221)402:3%3c313::AID-CNE3%3e3.0.CO;2-E
Agichtein E, Gravano L (2000) Snowball: Extracting relations from large plain-text collections. In: Proceedings of the fifth ACM conference on Digital libraries, ACM, pp 85–94, 10.1145/336597.336644,
Sleator DD, Temperley D (1995) Parsing english with a link grammar. arXiv preprint cmp-lg/9508004 https://www.aclweb.org/anthology/1993.iwpt-1.22
Sleator DD, Temperley D (Website) Index to link grammar documentation. https://www.abisource.com/projects/link-grammar/dict/index.html
Groenewegen HJ, Berendse HW (1990) Connections of the subthalamic nucleus with ventral striatopallidal parts of the basal ganglia in the rat. J Comparative Neurol 294(4):607–622. https://doi.org/10.1002/cne.902940408
Frakes WB (1992) Information retrieval: data structures & algorithms. Pearson Education India, DOI 10(1145/182119):1096164
Suchanek FM, Ifrim G, Weikum G (2006) Leila: Learning to extract information by linguistic analysis. In: Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp 18–25, https://www.aclweb.org/anthology/W06-0503
Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM (JACM) 21(1):168–173. https://doi.org/10.1145/321796.321811
Grinberg D, Lafferty J, Sleator D (1995) A robust parsing algorithm for link grammars. arXiv preprint cmp-lg/9508003 https://www.aclweb.org/anthology/1995.iwpt-1.15
Dong HW (2008) The Allen reference atlas: A digital color brain atlas of the C57Bl/6J male mouse. John Wiley & Sons Inc, 10.1111/j.1601-183x.2009.00552.x
Sharma A, Sharma A, Deodhare D, Chakraborti S, Kumar PS, Mitra PP (2016) Case representation and retrieval techniques for neuroanatomical connectivity extraction from pubmed. In: International Conference on Case-Based Reasoning, Springer, pp 370–386, 10.1007/978-3-319-47096-2\_25
Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval. In: Proceedings of the international communication of association for computing machinery conference, vol 4, 10.1017/CBO9780511809071
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2019) Biobert: pre-trained biomedical language representation model for biomedical text mining. arXiv preprint arXiv: 190108746. 10.1093/bioinformatics/btz682
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprintarXiv:181004805 https://arxiv.org/abs/1810.04805
Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27, 10.1109/ICCV.2015.11
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (Websiteb) Biobert model. https://gitbub.com/naver/biobert-pretrained
Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (Websitea) Biobert codebase. https://gitbub.com/dmis-lab/biobert
Bota M, Dong HW, Swanson LW (2005) Brain architecture management system. Neuroinformatics 3(1):15–47. https://doi.org/10.1385/NI:3:1:015
Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q (2019) Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv :190910351. 10.18653/v1/2020.findings-emnlp.372
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) Albert: A lite bert for self-supervised learning of language representations. In: International conference on learning representations, https://openreview.net/forum?id=H1eA7AEtvS
Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv: 191001108
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008, 10.5555/3295222.3295349
Kovaleva O, Romanov A, Rogers A, Rumshisky A (2019) Revealing the dark secrets of bert. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4356–4365, 10.18653/v1/D19-1445
Swanson L (1998) Structure of the rat brain: a laboratory guide with printed and electronic templates for data, models an schematics. Brain maps: Structure of the Rat Brain, 2nd Edn Amsterdam: Elsevier Science p pp 17–30, https://searchworks.stanford.edu/view/4106941
Swanson L (2004) Brain maps : structure of the rat brain : a laboratory guide with printed and electronic templates for data, models and schematics. Brain Maps: Structure of the Rat Brain, 3rd Edn Amsterdam: Elsevier https://searchworks.stanford.edu/view/4106941
Paxinos G, Watson C (2014) The rat brain in stereotaxic coordinates: hard cover edition. Elsevier, 10.1016/c2009-0-63235-9
Bota M, Swanson LW (2008) Bams neuroanatomical ontology: design and implementation. Front Neuroinform 2:2. https://doi.org/10.3389/neuro.11.002.2008
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Code availability
Software application, ‘ConnExt1’ can be accessed at http://brainarchitecture.org/text-mining.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sharma, A., Jayakumar, J., Mitra, P.P. et al. Application of Supervised Machine Learning to Extract Brain Connectivity Information from Neuroscience Research Articles. Interdiscip Sci Comput Life Sci 13, 731–750 (2021). https://doi.org/10.1007/s12539-021-00443-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12539-021-00443-6