Automatic Recognition of Chemical Entity Mentions in Texts of Scientific Publications

Biziukova, N. Yu.; Tarasova, O. A.; Rudik, A. V.; Filimonov, D. A.; Poroikov, V. V.

doi:10.3103/S0005105520060023

Automatic Recognition of Chemical Entity Mentions in Texts of Scientific Publications

AUTOMATED TEXT PROCESSING
Published: 26 February 2021

Volume 54, pages 306–315, (2020)
Cite this article

Automatic Documentation and Mathematical Linguistics Aims and scope

N. Yu. Biziukova^1,2,
O. A. Tarasova¹,
A. V. Rudik¹,
D. A. Filimonov¹ &
…
V. V. Poroikov¹

130 Accesses
1 Citation
Explore all metrics

Abstract—

The huge space of experimental data on biological and chemical objects and their interactions contributes to the rapid growth of scientific publications containing their analysis. Such data may include characteristics of low-molecular-weight compounds, results of their biological activity evaluation, and their interaction with human and animal proteins, methods of synthesis of organic compounds, and their classification. The past decades saw the development of methods for automated extraction of data from texts of scientific publications, including those for retrieval of names of organic compounds. These data can be used for the automatic identification of the names of organic compounds, including all possible synonyms. Since the topics of scientific publications are diverse, the extracted data can be applied to obtain information about (1) classification of organic compounds (2) methods of synthesis of a given organic compound; (3) physicochemical properties of this compound; (4) its interaction with high-molecular-weight compounds (including proteins, mRNA of animals and humans); and (5) the therapeutic properties of organic compounds, the active substance of the drug, and data on clinical trials. This review considers the methods aimed at searching and extracting data on names of low-molecular-weight compounds and interactions between them and animal and human proteins (biological objects), as well as data on experimentally confirmed biological activity and the effects of organic compounds (including drugs) on pathological processes. Here, we discuss the methods developed and the results of their application published over the past 10 years.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A document processing pipeline for annotating chemical entities in scientific documents

Article Open access 19 January 2015

Chemical Named Entity Recognition: Improving Recall Using a Comprehensive List of Lexical Features

CheNER: a tool for the identification of chemical entities and their classes in biomedical literature

Article Open access 19 January 2015

REFERENCES

Krallinger, M., Rabal, O., Lourenço, A., Oyarzabal, J., and Valencia, A., Information retrieval and text mining technologies for chemistry, Chem. Rev., 2017, vol. 117, no. 12, pp. 7673–7761.
Article Google Scholar
Przybyła, P., Shardlow, M., Aubin, S., Bossy, R., Eckart de Castilho, R., Piperidis, S., Mcnaught, J., and Ananiadou, S., Text mining resources for the life sciences, Database, 2016, vol. 2016, pp. 1–30.
Article Google Scholar
Oellrich, A., Gkoutos, G.V., Hoehndorf, R., and Rebholz-Schuhmann, D., Quantitative comparison of mapping methods between human and mammalian phenotype ontology, J. Biomed. Semantics, 2012, vol. 3, no. s2/s1, pp. 1–10.
O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., and Ananiadou, S., Using text mining for study identification in systematic reviews: A systematic review of current approaches, Syst. Rev., 2015, vol. 4, no. 5, pp. 1–22.
Article Google Scholar
Smink, W.A.C., Fox, J.-P., Tjong Kim Sang, E., Sools, A.M., Westerhof, G.J., and Veldkamp, B.P., Understanding therapeutic change process research through multilevel modeling and text mining, Front. Psychol., 2019, vol. 10, p. 1186.
Article Google Scholar
PubMed. https://pubmed.ncbi.nlm.nih.gov/.
Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Lu, Zh., Leaman, R., Lu, Y., Ji, D., Lowe, D.M., Sayle, R.A., Batista-Navarro, R.Th., Rak, R., Huber, T., Rocktäschel, T., et al., The CHEMDNER corpus of chemicals and drugs and its annotation principles, J. Cheminf., 2015, vol. 7, artic. no. S2.
khondi, S.A., Hettne, K.M., van der Horst, E., van Mulligen, E.M., and Kors, J.A., Recognition of chemical entities: Combining dictionary-based and grammar-based approaches, J. Cheminf., 2015, vol. 7, artic. no. S10
NCBI. https://www.ncbi.nlm.nih.gov/mesh/.
Li, J., Sun, Y., Johnson, R.J., Sciaky, D., Wei, C.-H., Leaman, R., Davis, A.P., Mattingly, C.J., Wiegers, T.C., and Lu, Z., BioCreative V CDR task corpus: A resource for chemical disease relation extraction, Database, 2016, vol. 2016, artic. no. baw086.
Article Google Scholar
Wei, C.-H., Peng, Y., Leaman, R., Davis, A.P., Mattingly, C.J., Li, J., Wiegers, T.C., and Lu, Z., Assessing the state of the art in biomedical relation extraction: Overview of the BioCreative V chemical-disease relation (CDR) task, Database, 2016, vol. 2016, artic. no. baw032.
Article Google Scholar
Madan, S., Szostak, J., Komandur Elayavilli, R., Tsai, R.T.-H., Ali, M., Qian, L., Rastegar-Mojarad, M., Hoeng, J., and Fluck, J., The extraction of complex relationships and their conversion to biological expression language (BEL) overview of the BioCreative VI (2019) BEL track, Database, 2019, vol. 2019, artic. no. baz084.
Article Google Scholar
Martínez, V., Navarro, C., Cano, C., Fajardo, W., and Blanco, A., DrugNet: Network-based drug–disease prioritization by integrating heterogeneous data, Artif. Intell. Med., 2015, vol. 63, no. 1, pp. 41–49.
Article Google Scholar
Herrero-Zazo, M., Segura-Bedmar, I., Martinez, P., and Declerck, T., The DDI corpus: An annotated corpus with pharmacological substances and drug–drug interactions, J. Biomed. Inf., 2013, vol. 46, no. 5, pp. 914–920.
Article Google Scholar
Pérez-Pérez, M., Rabal, O., Pérez-Rodríguez, G., Vazquez, M., Fdez-Riverola, F., Oyarzabal, J., Valencia, A., Lourenço, A., and Krallinger, M., Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: The CEMP and GPRO patents tracks, Proceedings of the BioCreative. Vers.5. Challenge Evaluation Workshop, 2017, pp. 11–18. https://b-iocreative.bioinformatics.udel.edu/media/store/files/2017/BioCreative_V5_paper2.pdf.
Bada, M., Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Baumgartner, Jr., W.A., Cohen, B., Verspoor, K., Blake, J.A., and Hunter, L.E., Concept annotation in the CRAFT corpus, BMC Bioinf., 2012, vol. 13, no. 161, pp. 1–10.
Article Google Scholar
Kolarik, C., Klinger, R., Friedrich, C.M., Hofmann-Apitius, M., and Fluck, J., Chemical names: Terminological resources and corpora annotation, Workshop on Building and Evaluating Resources for Biomedical Text Mining (6th Edition of the Language Resources and Evaluation Conference), Marrakech, Morocco, 2008, pp. 51–58. https://pub.uni-bielefeld.de/record/2603498.
Cañada, A., Capella-Gutierrez, S., Rabal, O., Oyarzabal, J., Valencia, A., and Krallinger, M., LimTox: A web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes, Nucleic Acids Res., 2017, vol. 45, no. W1, pp. W484–W489.
Article Google Scholar
Swain, M.C. and Cole, J.M., ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature, J. Chem. Inf. Model., 2016, vol. 56, no. 10, pp. 1894–1904.
Article Google Scholar
Batista-Navarro, R., Rak, R., and Ananiadou, S., Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics, J. Cheminf., 2015, vol. 7, artic. no. S6.
Leaman, R., Khare, R., and Lu, Z., Challenges in clinical natural language processing for automated disorder normalization, J. Biomed. Inf., 2015, vol. 57, pp. 28–37.
Article Google Scholar
Rocktäschel, T., Weidlich, M., and Leser, U., ChemSpot: A hybrid system for chemical named entity recognition, Bioinformatics, 2012, vol. 28, no. 12, pp. 1633–1640.
Article Google Scholar
Campos, D., Bui, Q.-C., Matos, S., and Oliveira, J.L., TrigNER: Automatically optimized biomedical event trigger recognition on scientific documents, Source Code Biol. Med., 2014, vol. 9, no. 1, p. 1.
Article Google Scholar
Lu, Z. and Hirschman, L., Biocuration workflows and text mining: Overview of the BioCreative 2012 Workshop Track II, Database, 2012, vol. 2012, artic. no. bas043.
Google Scholar
Liu, H., Christiansen, T., Baumgartner, W.A., and Verspoor, K., BioLemmatizer: A lemmatization tool for morphological processing of biomedical text, J. Biomed. Semantics, 2012, vol. 3, no. 3, pp. 1–29.
Article Google Scholar
Song, H.-J., Jo, B.-C., Park, C.-Y., Kim, J.-D., and Kim, Y.-S., Comparison of named entity recognition methodologies in biomedical documents, BioMed. Eng. OnLine, 2018, vol. 17, suppl. 2, pp. 158–192.
Article Google Scholar
Halberstam, N.M., Baskin, I.I., Palyulin, V.A., and Zefirov, N.S., Neural networks as a method for elucidating structure-property relationships for organic compounds, Russ. Chem. Rev., 2003, vol. 72, no. 7, pp. 629–649.
Article Google Scholar
Baskin, I.I., Madzhidov, T.I., Antipin, I.S., and Varnek, A.A., Artificial intelligence in synthetic chemistry: Achievements and prospects, Russ. Chem. Rev., 2017, vol. 86, no. 11, pp. 1127–1156.
Article Google Scholar
Cho, H. and Lee, H., Biomedical named entity recognition using deep neural networks with contextual information, BMC Bioinf., 2019, vol. 20, no. 1, pp. 735–746.
Article Google Scholar
Maheswaranathan, N., Williams, A.H., Golub, M.D., Ganguli, S., and Sussillo, D., Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics, Adv. Neural Inf. Process. Syst., 2019, vol. 32, pp. 15696–15705.
Google Scholar
Li, Z., Gurgel, H., Dessay, N., Hu, L., Xu, L., and Gong, P., Semi-supervised text classification framework: An overview of dengue landscape factors and satellite earth observation, Int. J. Environ. Res. Public Health, 2020, vol. 17, no. 12, pp. 4509–4538.
Article Google Scholar
Kaewphan, S., Hakala, K., Miekka, N., Salakoski, T., and Ginter, F., Wide-scope biomedical named entity recognition and normalization with CRFs, fuzzy matching and character level modeling, Database, 2018, vol. 2018, artic. no. bay096.
Article Google Scholar
Campos, D., Matos, S., and Oliveira, J.L., A document processing pipeline for annotating chemical entities in scientific documents, J. Cheminf., 2015, vol. 7, artic. no. S7.
Korvigo, I., Holmatov, M., Zaikovskii, A., and Skoblov, M., Putting hands to rest: Efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules, J. Cheminf., 2018, vol. 1, p. 28.
Article Google Scholar
Luo, L., Yang, Z., Yang, P., Zhang, Y., Wang, L., Lin, H., and Wang, J., An attention-based BILSTM-CRF approach to document-level chemical named entity recognition, Bioinformatics, 2018, vol. 34, no. 8, pp. 1381–1388.
Article Google Scholar
Hemati, W. and Mehler, A., LSTMVoter: Chemical named entity recognition using a conglomerate of sequence labeling tools, J. Cheminf., 2019, vol. 11, no. 3, pp. 1–7.
Article Google Scholar
Lung, P.-Y., He, Z., Zhao, T., Yu, D., and Zhang, J., Extracting chemical–protein interactions from literature using sentence structure analysis and feature engineering, Database, 2019, vol. 2019, artic. no. bay138.
Article Google Scholar
Capuzzi, S.J., Thornton, T.E., Liu, K., Baker, N., Lam, W.I., O’Banion, C.P., Muratov, E.N., Pozefsky, D., and Tropsha, A., Chemotext: A publicly available web server for mining drug–target–disease relationships in PubMed, J. Chem. Inf. Model., 2018, vol. 58, no. 2, pp. 212–218.
Article Google Scholar
Mao, Y. and Lu, Z., MeSH Now: Automatic MeSH indexing at PubMed scale via learning to rank, J. Biomed. Semantics, 2017, vol. 8, no. 1, pp. 15–24.
Article Google Scholar
Ponomarenko, E.A., Lisitsa, A.V., Il’gisonis, E.V., and Archakov, A.I., Construction of protein semantic networks using PubMed/MEDLINE, Mol. Biol., 2010, vol. 44, pp. 140–149.
Article Google Scholar
Vempati, U.D. and Schurer, S.C., Development and applications of the bioassay ontology (BAO) to describe and categorize high-throughput assays, in Assay Guidance Manual, Markossian, S., Sittampalam, G.S., Grossman, A., et al., Bethesda: Eli Lilly & Company and the National Center for Advancing Translational Sciences, 2004, pp. 1045–1069.
Hastings, J., Chepelev, L., Willighagen, E., Adams, N., Steinbeck, Ch., and Dumontier, M., The chemical information ontology: Provenance and disambiguation for chemical data on the biological semantic web, PLoS ONE, 2011, vol. 6, no. 10, p. e25513.
Article Google Scholar

Download references

Funding

The study was financed by the Russian Science Foundation (project no. 19-15-00396).

Author information

Authors and Affiliations

Orekhovich Research Institute of Biomedical Chemistry, 119435, Moscow, Russia
N. Yu. Biziukova, O. A. Tarasova, A. V. Rudik, D. A. Filimonov & V. V. Poroikov
Pirogov Russian National Research Medical University, 117997, Moscow, Russia
N. Yu. Biziukova

Authors

N. Yu. Biziukova
View author publications
You can also search for this author in PubMed Google Scholar
O. A. Tarasova
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Rudik
View author publications
You can also search for this author in PubMed Google Scholar
D. A. Filimonov
View author publications
You can also search for this author in PubMed Google Scholar
V. V. Poroikov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to N. Yu. Biziukova, O. A. Tarasova, A. V. Rudik, D. A. Filimonov or V. V. Poroikov.

Ethics declarations

The authors declare that they have no conflicts of interest.

About this article

Cite this article

Biziukova, N.Y., Tarasova, O.A., Rudik, A.V. et al. Automatic Recognition of Chemical Entity Mentions in Texts of Scientific Publications. Autom. Doc. Math. Linguist. 54, 306–315 (2020). https://doi.org/10.3103/S0005105520060023

Download citation

Received: 31 August 2020
Published: 26 February 2021
Issue Date: November 2020
DOI: https://doi.org/10.3103/S0005105520060023

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions