Skip to main content

Advertisement

Log in

Application of Supervised Machine Learning to Extract Brain Connectivity Information from Neuroscience Research Articles

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Understanding the complex connectivity structure of the brain is a major challenge in neuroscience. Vast and ever-expanding literature about neuronal connectivity between brain regions already exists in published research articles and databases. However, with the ever-expanding increase in published articles and repositories, it becomes difficult for a neuroscientist to engage with the breadth and depth of any given field within neuroscience. Natural Language Processing (NLP) techniques can be used to mine ‘Brain Region Connectivity’ information from published articles to build a centralized connectivity resource helping neuroscience researchers to gain quick access to research findings. Manually curating and continuously updating such a resource involves significant time and effort. This paper presents an application of supervised machine learning algorithms that perform shallow and deep linguistic analysis of text to automatically extract connectivity between brain region mentions. Our proposed algorithms are evaluated using benchmark datasets collated from PubMed and our own dataset of full text articles annotated by a domain expert. We also present a comparison with state-of-the-art methods including BioBERT. Proposed methods achieve best recall and \(F_2\) scores negating the need for any domain-specific predefined linguistic patterns. Our paper presents a novel effort towards automatically generating interpretable patterns of connectivity for extracting connected brain region mentions from text and can be expanded to include any other domain-specific information.

Graphic Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data Availability

Models generated on the benchmark corpus WhiteText, Supplementary material including the datasets are made available at https://github.com/ashika55/BRConnExt.

Notes

  1. http://brainarchitecture.org/text-mining

References

  1. Canese K, Weis S (2013) Pubmed: the bibliographic database. In: The NCBI Handbook [internet]. 2nd edition, National Center for Biotechnology Information (US), https://www.ncbi.nlm.nih.gov/sites/books/NBK153385/

  2. Sporns O (2011) The human connectome: a complex network. Annals of the New York Academy of Sciences 1224(1):109–125. https://doi.org/10.1016/S0920-9964(12)70100-7

    Article  PubMed  Google Scholar 

  3. Richardet R, Chappelier JC, Telefont M, Hill S (2015) Large-scale extraction of brain connectivity from the neuroscientific literature. Bioinformatics 31(10):1640–1647. https://doi.org/10.1093/bioinformatics/btv025

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. French L, Lane S, Xu L, Siu C, Kwok C, Chen Y, Krebs C, Pavlidis P (2012) Application and evaluation of automated methods to extract neuroanatomical connectivity statements from free text. Bioinformatics 28(22):2963–2970. https://doi.org/10.1093/bioinformatics/bts542

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. French L, Liu P, Marais O, Koreman T, Tseng L, Lai A, Pavlidis P (2015) Text mining for neuroanatomy using whitetext with an updated corpus and a new web application. Front Neuroinform 9:13. https://doi.org/10.3389/fninf.2015.00013

    Article  PubMed  PubMed Central  Google Scholar 

  6. Künzle H, Radtke-Schuller S (2000) Basal telencephalic regions connected with the olfactory bulb in a madagascan hedgehog tenrec. J Comparative Neurol 423(4):706–726

    Article  Google Scholar 

  7. Hobbs JR (2002) Information extraction from biomedical text. J Biomed Inform 35(4):260–264. https://doi.org/10.1016/S1532-0464(03)00015-7

    Article  CAS  PubMed  Google Scholar 

  8. Tikk D, Thomas P, Palaga P, Hakenberg J, Leser U (2010) A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature. PLoS Comput Biol 6(7):e1000837. https://doi.org/10.1371/journal.pcbi.1000837

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Wu HY, Chiang CW, Li L (2014) Text mining for drug–drug interaction. In: Biomedical Literature Mining, Springer, pp 47–75, 10.1007/978-1-4939-0709-0\_4

  10. French L, Lane S, Xu L, Pavlidis P (2009) Automated recognition of brain region mentions in neuroscience literature. Front Neuroinform 3:29. https://doi.org/10.3389/neuro.11.029.2009

    Article  PubMed  PubMed Central  Google Scholar 

  11. Giuliano C, Lavelli A, Romano L (2006) Exploiting shallow linguistic information for relation extraction from biomedical literature. In: 11th Conference of the European Chapter of the Association for Computational Linguistics, https://www.aclweb.org/anthology/E06-1051

  12. Kluegl P, Toepfer M, Beck PD, Fette G, Puppe F (2016) Uima ruta: rapid development of rule-based information extraction applications. Nat Lang Eng 22(1):1–40. https://doi.org/10.1017/S1351324914000114

    Article  Google Scholar 

  13. Gökdeniz E, Özgür A, Canbeyli R (2016) Automated neuroanatomical relation extraction: a linguistically motivated approach with a pvt connectivity graph case study. Front Neuroinform 10:39. https://doi.org/10.3389/fninf.2016.00039

    Article  PubMed  PubMed Central  Google Scholar 

  14. Künzle H (1998) Thalamic territories innervated by cerebellar nuclear afferents in the hedgehog tenrec, echinops telfairi. J Comparative Neurol 402(3):313–326. 10.1002/(SICI)1096-9861(19981221)402:3%3c313::AID-CNE3%3e3.0.CO;2-E

  15. Agichtein E, Gravano L (2000) Snowball: Extracting relations from large plain-text collections. In: Proceedings of the fifth ACM conference on Digital libraries, ACM, pp 85–94, 10.1145/336597.336644,

  16. Sleator DD, Temperley D (1995) Parsing english with a link grammar. arXiv preprint cmp-lg/9508004 https://www.aclweb.org/anthology/1993.iwpt-1.22

  17. Sleator DD, Temperley D (Website) Index to link grammar documentation. https://www.abisource.com/projects/link-grammar/dict/index.html

  18. Groenewegen HJ, Berendse HW (1990) Connections of the subthalamic nucleus with ventral striatopallidal parts of the basal ganglia in the rat. J Comparative Neurol 294(4):607–622. https://doi.org/10.1002/cne.902940408

    Article  CAS  Google Scholar 

  19. Frakes WB (1992) Information retrieval: data structures & algorithms. Pearson Education India, DOI 10(1145/182119):1096164

    Google Scholar 

  20. Suchanek FM, Ifrim G, Weikum G (2006) Leila: Learning to extract information by linguistic analysis. In: Proceedings of the 2nd Workshop on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp 18–25, https://www.aclweb.org/anthology/W06-0503

  21. Wagner RA, Fischer MJ (1974) The string-to-string correction problem. J ACM (JACM) 21(1):168–173. https://doi.org/10.1145/321796.321811

    Article  Google Scholar 

  22. Grinberg D, Lafferty J, Sleator D (1995) A robust parsing algorithm for link grammars. arXiv preprint cmp-lg/9508003 https://www.aclweb.org/anthology/1995.iwpt-1.15

  23. Dong HW (2008) The Allen reference atlas: A digital color brain atlas of the C57Bl/6J male mouse. John Wiley & Sons Inc, 10.1111/j.1601-183x.2009.00552.x

  24. Sharma A, Sharma A, Deodhare D, Chakraborti S, Kumar PS, Mitra PP (2016) Case representation and retrieval techniques for neuroanatomical connectivity extraction from pubmed. In: International Conference on Case-Based Reasoning, Springer, pp 370–386, 10.1007/978-3-319-47096-2\_25

  25. Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval. In: Proceedings of the international communication of association for computing machinery conference, vol 4, 10.1017/CBO9780511809071

  26. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (2019) Biobert: pre-trained biomedical language representation model for biomedical text mining. arXiv preprint arXiv: 190108746. 10.1093/bioinformatics/btz682

  27. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprintarXiv:181004805 https://arxiv.org/abs/1810.04805

  28. Zhu Y, Kiros R, Zemel R, Salakhutdinov R, Urtasun R, Torralba A, Fidler S (2015) Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In: Proceedings of the IEEE international conference on computer vision, pp 19–27, 10.1109/ICCV.2015.11

  29. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (Websiteb) Biobert model. https://gitbub.com/naver/biobert-pretrained

  30. Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J (Websitea) Biobert codebase. https://gitbub.com/dmis-lab/biobert

  31. Bota M, Dong HW, Swanson LW (2005) Brain architecture management system. Neuroinformatics 3(1):15–47. https://doi.org/10.1385/NI:3:1:015

    Article  PubMed  Google Scholar 

  32. Jiao X, Yin Y, Shang L, Jiang X, Chen X, Li L, Wang F, Liu Q (2019) Tinybert: Distilling bert for natural language understanding. arXiv preprint arXiv :190910351. 10.18653/v1/2020.findings-emnlp.372

  33. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2020) Albert: A lite bert for self-supervised learning of language representations. In: International conference on learning representations, https://openreview.net/forum?id=H1eA7AEtvS

  34. Sanh V, Debut L, Chaumond J, Wolf T (2019) Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv: 191001108

  35. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008, 10.5555/3295222.3295349

  36. Kovaleva O, Romanov A, Rogers A, Rumshisky A (2019) Revealing the dark secrets of bert. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp 4356–4365, 10.18653/v1/D19-1445

  37. Swanson L (1998) Structure of the rat brain: a laboratory guide with printed and electronic templates for data, models an schematics. Brain maps: Structure of the Rat Brain, 2nd Edn Amsterdam: Elsevier Science p pp 17–30, https://searchworks.stanford.edu/view/4106941

  38. Swanson L (2004) Brain maps : structure of the rat brain : a laboratory guide with printed and electronic templates for data, models and schematics. Brain Maps: Structure of the Rat Brain, 3rd Edn Amsterdam: Elsevier https://searchworks.stanford.edu/view/4106941

  39. Paxinos G, Watson C (2014) The rat brain in stereotaxic coordinates: hard cover edition. Elsevier, 10.1016/c2009-0-63235-9

  40. Bota M, Swanson LW (2008) Bams neuroanatomical ontology: design and implementation. Front Neuroinform 2:2. https://doi.org/10.3389/neuro.11.002.2008

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ashika Sharma.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Code availability

Software application, ‘ConnExt1’ can be accessed at http://brainarchitecture.org/text-mining.

Supplementary Information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sharma, A., Jayakumar, J., Mitra, P.P. et al. Application of Supervised Machine Learning to Extract Brain Connectivity Information from Neuroscience Research Articles. Interdiscip Sci Comput Life Sci 13, 731–750 (2021). https://doi.org/10.1007/s12539-021-00443-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12539-021-00443-6

Keywords

Navigation