Skip to main content
Log in

Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition

  • Special Issue Article
  • Published:
Computing Aims and scope Submit manuscript

Abstract

In the present era, there is a large amount of new data available readily from different sources to collect and store. One of the main problems is to label these new data for various machine learning applications correctly. The active learning approach presents a unique case of machine learning which is widely used to solve the above problem by significantly minimizing the need for labeled data. It aims to select the most appropriate samples from the unlabeled data to be correctly labeled by the oracle and is passed to train the active learner incrementally. There are several different query sampling strategies that exist using which the appropriate samples are selected. One of the main problems with the active learning approach is that it is very time-consuming. So in this research work, a new multi-core-based algorithm is proposed to speed up the active learning approach, which can utilize the complete computational resources present in the system. The experiments have been performed for the problem of named entity recognition which deals with labeling the sequences of words in an unstructured text by classifying them into pre-existing categories. The proposed algorithm is evaluated in terms of both: the performance and execution time over three named entity recognition corpus of distinct biomedical domains. The evaluation results shows considerable improvement in terms of execution time for the proposed active learning algorithm than the existing active learning approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. He Z, Li L, Zheng S, Zou X, Situ H (2019) Quantum speedup for pool-based active learning. Quantum Inf Process 18:345. https://doi.org/10.1007/s11128-019-2460-x

    Article  MathSciNet  MATH  Google Scholar 

  2. Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6:1–114. https://doi.org/10.2200/S00429ED1V01Y201207AIM018

    Article  MathSciNet  MATH  Google Scholar 

  3. Kumar P, Gupta A (2020) Active learning query strategies for classification, regression, and clustering: a survey. J Comput Sci Technol 35:913–945. https://doi.org/10.1007/s11390-020-9487-4

    Article  Google Scholar 

  4. Agrawal A, Tripathi S (2020) Active learning using margin sampling strategy for entity recognition. In: Gunjan VK, Senatore S, Kumar A, Gao X-Z, Merugu S (eds) Advances in cybernetics, cognition, and machine learning for communication technologies. Springer, Singapore, pp 163–169

    Chapter  Google Scholar 

  5. Agrawal A, Tripathi S, Vardhan M (2021) Active learning approach using a modified least confidence sampling strategy for named entity recognition. Prog Artif Intell. https://doi.org/10.1007/s13748-021-00230-w

    Article  Google Scholar 

  6. Agrawal A, Tripathi S, Vardhan M (2021) Uncertainty query sampling strategies for active learning of named entity recognition task. Intell Decision Technol 15:99–114. https://doi.org/10.3233/IDT-200048

    Article  Google Scholar 

  7. Alokaili A, Menai MEB (2020) SVM ensembles for named entity disambiguation. Computing 102:1051–1076. https://doi.org/10.1007/s00607-019-00748-x

    Article  MathSciNet  Google Scholar 

  8. Zhao Y, Zhang H, Zhou S, Zhang Z (2020) Active learning approaches to enhancing neural machine translation. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, Online, pp 1796–1806

  9. Xia Y (2020) Research on statistical machine translation model based on deep neural network. Computing 102:643–661. https://doi.org/10.1007/s00607-019-00752-1

    Article  MathSciNet  MATH  Google Scholar 

  10. Jiang Z, Gao S, Chen L (2020) Study on text representation method based on deep learning and topic information. Computing 102:623–642. https://doi.org/10.1007/s00607-019-00755-y

    Article  MathSciNet  MATH  Google Scholar 

  11. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning based text classification: a comprehensive review. ACM Comput Surv. https://doi.org/10.1145/3439726

    Article  Google Scholar 

  12. Shen Y, Yun H, Lipton ZC, Kronrod Y, Anandkumar A (2017) Deep active learning for named entity recognition. CoRR abs/1707.0

  13. Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 1070–1079

  14. Ekbal A, Saha S, Sikdar UK (2016) On active annotation for named entity recognition. Int J Mach Learn Cybern 7:623–640. https://doi.org/10.1007/s13042-014-0275-8

    Article  Google Scholar 

  15. Liu M, Tu Z, Wang Z, Xu X (2020) LTP: a new active learning strategy for bert-CRF based named entity recognition. http://arxiv.org/abs/1707.05928

  16. Huang H, Wang H, Jin D (2018) A low-cost named entity recognition research based on active learning. Sci Program 2018:10. https://doi.org/10.1155/2018/1890683

    Article  Google Scholar 

  17. Tran VC, Hoang DT, Nguyen NT, Hwang D (2017) A hybrid method for named entity recognition on tweet streams. In: Nguyen NT, Tojo S, Nguyen LM, Trawiński B (eds) Intelligent information and database systems. Springer, Cham, pp 258–268

    Chapter  Google Scholar 

  18. Tsymbalov E, Makarychev S, Shapeev A, Panov M (2019) Deeper connections between neural networks and Gaussian processes speed up active learning. CoRR. abs/1902.1

  19. Sang KTEF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the 7th conference on natural language learning at HLT-NAACL. Association for Computational Linguistics, pp 142–147

  20. Doğan RI, Leaman R, Lu Z (2014) NCBI disease corpus: a resource for disease name recognition and concept normalization. J Biomed Inform 47:1–10. https://doi.org/10.1016/j.jbi.2013.12.006

    Article  Google Scholar 

  21. Li J, Sun Y, Johnson RJ, Sciaky D, Wei C-H, Leaman R, Davis AP, Mattingly CJ, Wiegers TC, Lu Z (2016) BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database. https://doi.org/10.1093/database/baw068

    Article  Google Scholar 

  22. Crichton G, Pyysalo S, Chiu B, Korhonen A (2017) A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinform 18:368. https://doi.org/10.1186/s12859-017-1776-8

    Article  Google Scholar 

  23. Cancer Genetics (CG) task: BioNLP-ST 2013. http://2013.bionlp-st.org/tasks/cancer-genetics

  24. Settles B (2004) Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proceedings of the international joint workshop on natural language processing in biomedicine and its applications. Association for Computational Linguistics, Stroudsburg, pp 104–107

  25. Chen Y, Lasko TA, Mei Q, Denny JC, Xu H (2015) A study of active learning methods for named entity recognition in clinical text. J Biomed Inform 58:11–18. https://doi.org/10.1016/j.jbi.2015.09.010

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ankit Agrawal.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Agrawal, A., Tripathi, S. & Vardhan, M. Multicore based least confidence query sampling strategy to speed up active learning approach for named entity recognition. Computing 105, 979–997 (2023). https://doi.org/10.1007/s00607-021-01000-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-021-01000-1

Keywords

Mathematics Subject Classification

Navigation