Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval

Sharma, Vijay Kumar; Mittal, Namita; Vidyarthi, Ankit

doi:10.1007/s11042-021-11074-w

Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval

1207: Innovations in Multimedia Information Processing & Retrieval
Published: 11 June 2021

Volume 82, pages 8197–8212, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

246 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Cross-Lingual Information Retrieval (CLIR) enables a user to query in a language which is different from the target documents language. CLIR incorporates a translation technique based on either a manual dictionary or a probabilistic dictionary which is generated from a parallel corpus. The translation techniques for Hindi language suffer from a translation mis-mapped issue which is due to the morphological richness of Hindi language. In addition, a word may have multiple translations in a dictionary leading to word translation disambiguation issue. This paper addresses two key findings, i.e., Semantic Morphological Variant Selection (SMVS), and Hybrid Word Translation Disambiguation (HWTD), the former resolves translation mis-mapped issue and the later disambiguates the queries more effectively. The proposed techniques are investigated for FIRE ad-hoc datasets, where SMVS and HWTD at word level achieve better evaluation measures in comparison to the baseline Statistical Machine Translation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-Lingual Information Retrieval: A Dictionary-Based Query Translation Approach

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches

Notes

References

Adriani M (2021) Using statistical term similarity for sense disambiguation in cross-language information retrieval. Inf Retr 2(1):71–82
Article MathSciNet Google Scholar
Das A, Debasis G, Utpal G (2017) Named entity recognition with word embeddings and wikipedia categories for a low-resource language. ACM Trans Asian Low-Resour Lang Inform Process (TALLIP) 16(3):18
Google Scholar
Duque A, Martinez-Romo J, Araujo L (2015) Choosing the best dictionary for cross-lingual word sense disambiguation. Knowl-Based Syst 81:65–75
Article Google Scholar
Finch A, Taisuke H, Kumiko T, Eiichiro S (2017) Inducing a bilingual lexicon from short parallel multiword sequences. ACM Trans Asian Low-Resour Lang Inf Process (TALLIP) 16(3):15
Google Scholar
Ganguly D, Leveling J, Jones G (2012) Cross-lingual topical relevance models. In: Proceedings of COLING, vol 2012, pp 927–942
Ganguly D, Roy D, Mitra M, Jones G (2015) A word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 795–798
Gupta SK, Sinha A, Jain M (2011) Cross lingual information retrieval with SMT and query mining. Adv Comput 2(5):33
Google Scholar
Hosseinzadeh Vahid A, Arora P, Liu Q, Jones GJ (2015) A comparative study of online translation services for cross language information retrieval. In: Proceedings of the 24th international conference on world wide web, pp 859–864
Jagarlamudi J, Kumaran A (2007) Cross-lingual information retrieval system for indian languages. In: Advances in multilingual and multimodal information retrieval. Springer, Berlin Heidelberg, pp 80–87
Janarthanam SC, Sethuramalingam S, Nallasamy U (2008) Named entity transliteration for cross-language information retrieval using compressed word format mapping algorithm. In: Proceedings of the 2nd ACM workshop on improving non english web searching, pp 33–38
Jean S, Lauly S, Firat O, Cho K (2017) Neural machine translation for cross-lingual pronoun prediction. In: Proceedings of the third workshop on discourse in machine translation, pp 54–57
Karimi S, Falk S, Andrew T (2011) Machine transliteration survey. ACM Comput Surv (CSUR) 43(3):17
Article MATH Google Scholar
Klementiev A, Titov I, Bhattarai B (2012) Inducing crosslingual distributed representations of words. In: Saarland Univerisity, Germany
Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge
Book MATH Google Scholar
Kunchukuttan A, Mehta P, Bhattacharyya P (2017) The IIT bombay english-hindi parallel corpus. arXiv:1710.02855
Larkey LS, Connell ME, Abduljaleel N (2003) Hindi CLIR in thirty days. ACM Trans Asian Lang Inf Process (TALIP) 2(2):130–142
Article Google Scholar
Mahapatra L, Mohan M, Khapra MM, Bhattacharyya P (2010) OWNS Cross-lingual word sense disambiguation using weighted overlap counts and wordnet based similarity measures. In: Proceedings of the 5th international workshop on semantic evaluation, pp 138–141
Makin R, Pandey N, Pingali P, Varma V (2007) Approximate string matching techniques for effective CLIR. In: International workshop on fuzzy logic and applications. Springer-Verlag, pp 430–437
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Monz C, Bonnie JD (2005) Iterative translation disambiguation for cross-language information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pp 520–527
Mustafa A, Tait J, Oakes M (2005) Literature review of cross-language information retrieval. Trans Eng Comp Technol
Nagarathinam A, Saraswathi S (2011) State of art: cross lingual information retrieval system for Indian languages. Int J Comput Appl 35(13):15–21
Google Scholar
Nasharuddin NA, Abdullah MT (2010) Cross-lingual information retrieval state-of-the-art. Electron J Comput Sci Inform Technol (EJCSIT) 2(1):1–5
Google Scholar
Nothman J, James RC, Tara M (2008) Transforming Wikipedia into named entity training data. In: Proceedings of the australian language technology workshop, pp 124–132
Pennington J, Richard S, Christopher M (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Pingali P, Ganesh S, Yella S, Varma V (2008) Statistical transliteration for cross language information retrieval using HMM alignment model and CRF. In: Proceedings of the 2nd workshop on cross lingual information access (CLIA) addressing the information need of multilingual societies
Pingali P, Varma V (2007) IIIT hyderabad at CLEF 2007-Adhoc Indian language CLIR task. In: CLEF (Working Notes)
Prasad G, Fousiya KK (2015) Named entity recognition approaches: A study applied to English and Hindi language. In: International conference on circuit, power and computing technologies (ICCPCT). IEEE, pp 1–4
Razmara M, Siahbani M, Haffari R, Sarkar A (2013) Graph propagation for paraphrasing out-of-vocabulary words in statistical machine translation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, vol 1, pp 1105–1115
Saravanan K, Udupa R, Kumaran A (2010) Cross lingual information retrieval system enhanced with transliteration generation and mining. Forum for information retrieval evaluation (FIRE-2010) workshop
Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. arXiv:1508.07909
Shakery A, Zhai C (2013) Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs. Inf Retr 16(1):1–29
Article Google Scholar
Sharma VK, Mittal N (2016) Exploring bilingual word vectors for Hindi-English cross-language information retrieval. In: Proceedings of the international conference on informatics and analytics, pp 1–4
Sharma VK, Mittal N (2016) Exploiting parallel sentences and cosine similarity for identifying target language translation. Procedia Comput Sci 89:428–33
Article Google Scholar
Sharma VK, Mittal N (2017) Named entity identification based translation disambiguation model. In: International conference on pattern recognition and machine intelligence. Springer, pp 365–372
Sharma VK, Mittal N (2018) Cross-lingual information retrieval: a dictionary-based query translation approach. Advances in computer and computational sciences. Springer, Singapore, pp 611–618
Google Scholar
Sharma VK, Mittal N, Vidyarthi A (2020) Context-based translation for the out of vocabulary words applied to hindi-english cross-lingual information retrieval. IETE Technical Review. pp 1–10
Sorg P, Philipp C (2012) Exploiting Wikipedia for cross-lingual and multilingual information retrieval. J Data Knowl Eng 74:26–45
Article Google Scholar
Ture F, Lin J (2014) Exploiting representations from statistical machine translation for cross-language information retrieval. ACM Trans Inf Syst (TOIS) 32 (4):1–32
Article Google Scholar
Turney PD (2004) Word sense disambiguation by web mining for word co-occurrence probabilities. arXiv:0407065
Vulic I, Moens MF (2015) Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 363–372
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J (2016) Google’s neural machine translation system. Bridging the gap between human and machine translation. arXiv:1609.08144
Xiaoning H, Peidong W, Haoliang Q, Muyun Y, Guohua L, Yong X (2008) Using Google translation in cross-lingual information retrieval. In: Proceedings of NTCIR-7 workshop meeting, pp 16–19
Zhang S, Duh K, Van Durme B (2017) Selective decoding for cross-lingual open information extraction. In: Proceedings of the eighth international joint conference on natural language processing (Volume 1: Long Papers), pp 832–842
Zhou D, Mark T, Tim B, Vincent W, Helen A (2012) Translation techniques in cross-language information retrieval. ACM Comput Surv (CSUR). 45 (1):1–44
Article Google Scholar
Zou WY, Socher R, Cer DM, Manning CD (2013) Bilingual word embeddings for phrase-based machine translation. EMNLP, pp 1393–1398

Download references

Author information

Authors and Affiliations

Malaviya National Institute of Technology, Jaipur, India
Vijay Kumar Sharma & Namita Mittal
Department of CSE & IT, Jaypee Institute of Information Technology, Noida, India
Ankit Vidyarthi

Authors

Vijay Kumar Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Namita Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Ankit Vidyarthi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ankit Vidyarthi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sharma, V.K., Mittal, N. & Vidyarthi, A. Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval. Multimed Tools Appl 82, 8197–8212 (2023). https://doi.org/10.1007/s11042-021-11074-w

Download citation

Received: 15 October 2020
Revised: 01 April 2021
Accepted: 11 May 2021
Published: 11 June 2021
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-021-11074-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval

Abstract

Access this article

Similar content being viewed by others

Cross-Lingual Information Retrieval: A Dictionary-Based Query Translation Approach

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semantic morphological variant selection and translation disambiguation for cross-lingual information retrieval

Abstract

Access this article

Similar content being viewed by others

Cross-Lingual Information Retrieval: A Dictionary-Based Query Translation Approach

Semantic Relatedness for All (Languages): A Comparative Analysis of Multilingual Semantic Relatedness Using Machine Translation

Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation