Systematic Homonym Detection and Replacement Based on Contextual Word Embedding

Lee, Younghoon

doi:10.1007/s11063-020-10376-8

Systematic Homonym Detection and Replacement Based on Contextual Word Embedding

Published: 20 October 2020

Volume 53, pages 17–36, (2021)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Younghoon Lee ORCID: orcid.org/0000-0003-4199-936X¹

518 Accesses
3 Citations
Explore all metrics

Abstract

Homonyms are words that share their spelling but differ in meaning and are a common feature in most languages. Homonyms are a source of noise i most text analyses and are difficult to detect; numerous studies have been conducted in this regard. However, extant methods typically detect homonyms using a rule-based or statistical-based approach, which requires an answer set, with little regard to the semantic meaning of the word. Therefore, we propose a novel approach for the detection of homonyms based on contextual word embedding that allows a word to be understood based on the context in which it appears. In this study, we extracted all contextual word embedding vectors of individual words and clustered those vectors using a spherical k-means clustering to detect pairs of homonyms. In addition, we developed a homonym replacement method to increase the performance of a document embedding technique, based on the word vector value. We replaced the embedding vectors of homonyms with a representative vector based on the respective meaning using the proposed homonym detection method. Experimental results indicate that the proposed method effectively detects homonyms and significantly improves the performance of document embedding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Method for Improving Word Representation Using Synonym Information

Improving Word Embeddings for Antonym Detection Using Thesauri and SentiWordNet

A cascaded framework for identification and extraction of antonym for Turkish language

Article 01 August 2018

Tuğba Yıldız & Savaş Yıldırım

References

An Y, Liu S, Wang H (2020) Error detection in a large-scale lexical taxonomy. Information 11(2):97
Article Google Scholar
Balazs JA, Velásquez JD (2016) Opinion mining and information fusion: a survey. Inf Fusion 27:95–110
Article Google Scholar
Bhardwaj P, Khosla P (2017) Review of text mining techniques. IITM J Manag IT 8(1):27–31
Google Scholar
Buchta C, Kober M, Feinerer I, Hornik K (2012) Spherical k-means clustering. J Stat Softw 50(10):1–22
Google Scholar
Correia RA, Jepson P, Malhado AC, Ladle RJ (2017) Internet scientific name frequency as an indicator of cultural salience of biodiversity. Ecol Indic 78:549–555
Article Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Ferreira AA, Veloso A, Gonçalves MA, Laender AH (2014) Self-training author name disambiguation for information scarce scenarios. J Assoc Inf Sci Technol 65(6):1257–1278
Article Google Scholar
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162
Article Google Scholar
Heo Y, Kang S, Seo J (2020) Hybrid sense classification method for large-scale word sense disambiguation. IEEE Access 8:27247–27256
Article Google Scholar
Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article Google Scholar
Kågebäck M, Salomonsson H (2016) Word sense disambiguation using a bidirectional lstm. arXiv preprint arXiv:1606.03568
Kim HK, Kim H, Cho S (2017) Bag-of-concepts: comprehending document representation through clustering words in distributed representation. Neurocomputing 266:336–352
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
Ladle RJ, Correia RA, Do Y, Joo GJ, Malhado AC, Proulx R, Roberge JM, Jepson P (2016) Conservation culturomics. Front Ecol Environ 14(5):269–275
Article Google Scholar
Lee Y, Im J, Cho S, Choi J (2018) Applying convolution filter to matrix of word-clustering based document representation. Neurocomputing 315:210–220
Article Google Scholar
Lee Y, Song S, Cho S, Choi J (2019) Document representation based on probabilistic word clustering in customer-voice classification. Pattern Anal Appl 22(1):221–232
Article MathSciNet Google Scholar
Liu P, Qiu X, Huang X (2016) Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101
Liu W, Islamaj Doğan R, Kim S, Comeau DC, Kim W, Yeganova L, Lu Z, Wilbur WJ (2014) Author name disambiguation for pub med. J Assoc Inf Sci Technol 65(4):765–781
Article Google Scholar
Louppe G, Al-Natsheh HT, Susik M, Maguire EJ (2016) Ethnicity sensitive author disambiguation using semi-supervised learning. In: International conference on knowledge engineering and the semantic web. Springer, pp 272–287
McDaid AF, Murphy BT, Friel N, Hurley NJ (2012) Model-based clustering in networks with stochastic community finding. arXiv preprint arXiv:1205.1997
Miao Y, Yu L, Blunsom P (2016) Neural variational inference for text processing. In: International conference on machine learning, pp 1727–1736
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mridha M, Hamid MA, Rana MM, Khan MEA, Ahmed MM, Sultan MT (2019) Semantic error detection and correction in Bangla sentence. In: 2019 Joint 8th international conference on informatics, electronics and vision (ICIEV) and 2019 3rd international conference on imaging, vision and pattern recognition (icIVPR). IEEE, pp 184–189
Müller MC (2017) Semantic author name disambiguation with word embeddings. In: International conference on theory and practice of digital libraries. Springer, pp 300–311
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365
Pittke F, Leopold H, Mendling J (2015) Automatic detection and resolution of lexical ambiguity in process models. IEEE Trans Softw Eng 41(6):526–544
Article Google Scholar
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. Open AI Blog 1(8):9
Google Scholar
Roll U, Correia RA, Berger-Tal O (2018) Using machine learning to disentangle homonyms in large text corpora. Conserv Biol 32(3):716–724
Article Google Scholar
Santana AF, Gonçalves MA, Laender AH, Ferreira AA (2017) Incremental author name disambiguation by exploiting domain-specific heuristics. J Assoc Inf Sci Technol 68(4):931–945
Article Google Scholar
dos Santos CN, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: COLING, pp 69–78
Schiemann T, Leser U, Hakenberg J (2009) Word sense disambiguation in biomedical applications: a machine learning approach. In: Information retrieval in biomedicine: natural language processing for knowledge integration. IGI Global, pp 142–161
Schuemie MJ, Kors JA, Mons B (2005) Word sense disambiguation in the biomedical domain: an overview. J Comput Biol 12(5):554–565
Article Google Scholar
Schulz C, Mazloumian A, Petersen AM, Penner O, Helbing D (2014) Exploiting citation networks for large-scale author name disambiguation. EPJ Data Sci 3(1):11
Article Google Scholar
Shaikh T, Deshpande D (2016) A review on opinion mining and sentiment analysis. Int J Comput Appl 975:8887
Google Scholar
Sharma S, Srivastava SK (2016) Review on text mining algorithms. Int J Comput Appl 134(8):39–43
Google Scholar
Shen Q, Wu T, Yang H, Wu Y, Qu H, Cui W (2016) Nameclarifier: a visual analytics system for author name disambiguation. IEEE Trans Vis Comput Graph 23(1):141–150
Article Google Scholar
Singh T (2016) A comprehensive review of text mining. Int J Comput Sci Inf Technol 7(1):167–169
Google Scholar
Smith NA (2019) Contextual word representations: a contextual introduction. arXiv preprint arXiv:1902.06006
Song M, Kim EHJ, Kim HJ (2015) Exploring author name disambiguation on pubmed-scale. J Informetr 9(4):924–941
Article MathSciNet Google Scholar
Songa X, Mina YJ, Da-Xionga L, Fengb WZ, Shua C (2019) Research on text error detection and repair method based on online learning community. Procedia Comput Sci 154:13–19
Article Google Scholar
Strehl A, Ghosh J (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3(Dec):583–617
MathSciNet MATH Google Scholar
Suárez-Paniagua V, Segura-Bedmar I, Martínez P (2015) Word embedding clustering for disease named entity recognition. In: Proceedings of the fifth biocreative challenge evaluation workshop, pp 299–304
Sun S, Luo C, Chen J (2017) A review of natural language processing techniques for opinion mining systems. Inf Fusion 36:10–25
Article Google Scholar
Tran HN, Huynh T, Do T (2014) Author name disambiguation by using deep neural network. In: Asian conference on intelligent information and database systems. Springer, pp 123–132
Tzanis G (2014) Biological and medical big data mining. Int J Knowl Discov Bioinform 4(1):42–56
Article Google Scholar
Urban R, Anisimowicz H (2019) A note on the Durda, Caron, and Buchanan word ambiguity detection algorithm. Fundam Inform 168(1):79–88
Article MathSciNet Google Scholar
Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 1073–1080
Vinh NX, Epps J, Bailey J (2010) Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance. J Mach Learn Res 11:2837–2854
MathSciNet MATH Google Scholar
Westgate MJ, Lindenmayer DB (2017) The difficulties of systematic reviews. Conserv Biol 31(5):1002–1007
Article Google Scholar
Xu H, Zhang C, Hao X, Hu Y (2007) A machine learning approach classification of deep web sources. In: Fourth international conference on fuzzy systems and knowledge discovery (FSKD 2007), vol 4. IEEE, pp 561–565
Yu J, Li J, Yu Z, Huang Q (2019) Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2019.2947482
Article Google Scholar
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2932058
Article Google Scholar
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Article Google Scholar
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced netvlad with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst 31(2):661–674
Article Google Scholar

Download references

Acknowledgements

This work was supported by a National Research Foundation of Korea (NRF) Grant funded by the Korean Government (MSIT) (No. 2020-0795).

Author information

Authors and Affiliations

Department of Industrial Engineering, Seoul National University of Science and Technology, 232, Gongneung-ro, Nowon-gu, Seoul, 01811, Republic of Korea
Younghoon Lee

Authors

Younghoon Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Younghoon Lee.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, Y. Systematic Homonym Detection and Replacement Based on Contextual Word Embedding. Neural Process Lett 53, 17–36 (2021). https://doi.org/10.1007/s11063-020-10376-8

Download citation

Accepted: 09 October 2020
Published: 20 October 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s11063-020-10376-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Systematic Homonym Detection and Replacement Based on Contextual Word Embedding

Abstract

Access this article

Similar content being viewed by others

A Method for Improving Word Representation Using Synonym Information

Improving Word Embeddings for Antonym Detection Using Thesauri and SentiWordNet

A cascaded framework for identification and extraction of antonym for Turkish language

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Systematic Homonym Detection and Replacement Based on Contextual Word Embedding

Abstract

Access this article

Similar content being viewed by others

A Method for Improving Word Representation Using Synonym Information

Improving Word Embeddings for Antonym Detection Using Thesauri and SentiWordNet

A cascaded framework for identification and extraction of antonym for Turkish language

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation