Abstract
Semantic similarity measures based on the estimation of the information content (IC) of concepts are currently regarded as the state of the art. Calculating the IC in an intrinsic (i.e., ontology-based) way is particularly convenient due to its accuracy and lack of dependency on annotated corpora. Intrinsic IC calculation models estimate concept probabilities from the taxonomic knowledge (i.e., number of hyponyms and/or hypernyms of the concepts) modelled in an ontology. In this paper, we aim to improve the intrinsic calculation of the IC by leveraging not only the hyponyms and hypernyms of concepts, but also the explicit evidences of synonymy and polysemy that ontologies such as WordNet also model. Specifically, we propose a more accurate intrinsic estimation of the concepts’ probabilities in which the IC calculation relies. We evaluate the accuracy of our proposal through a set of comprehensive experiments in which our IC calculation model is tested on a variety of IC-based similarity measures and benchmarks. Experimental results show that our proposal obtains consistently good accuracies, which vary less across measures and benchmarks than the most prominent intrinsic IC calculation models available in the literature.
Similar content being viewed by others
References
Adhikari A, Singh S, Dutta A, Dutta B (2015) A novel information theoretic approach for finding semantic similarity in WordNet. In: TENCON 2015 IEEE Region 10 conference, Macao, China, 2015. IEEE, pp 1–6
Adhikari A, Dutta B, Dutta A, Mondal D, Singh S (2018) An intrinsic information content-based semantic similarity measure considering the disjoint common subsumers of concepts of an ontology. J Assoc Inf Sci Technol 69:1023–1034
Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Human language technologies: the 2009 annual conference of the North American chapter of the ACL, 2009, pp 19–27
Batet M (2011) Ontology based semantic clustering. AI Commun 24:291–292
Batet M, Sánchez D (2014) Review on semantic similarity. In: Mehdi Khosrow-Pour DBA (ed) Encyclopedia of information science and technology, 3rd edn. IGI Global, Hershey, pp 7575–7583
Batet M, Harispe S, Ranwez S, Sánchez D, Ranwez V (2014) An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Inf Sci 283:197–210
Blanchard E, Harzallah M, Kuntz P (2008) A generic framework for comparing semantic similarities on a subsumption hierarchy. In: Proceedings of 18th European conference on artificial intelligence (ECAI), Patras, Greece, 21–25 July 2008. IOS Press, pp 20–24
Chan LWC, Liu Y, Shyu CR, Benzie IFF (2011) A SNOMED supported ontological vector model for subclinical disorder detection using EHR similarity. Eng Appl Artif Intell 24:1398–1409
Cimiano P (2006) Ontology learning and population from text: algorithms, evaluation and applications. Springer, Berlin
Clark P, Harrison P, Jenkins T, Thompson J, Wojcik R (2006) From WordNet to a knowledge base. Paper presented at the AAAI 2006 spring symposium on formalizing and compiling background knowledge
Dice LR (1945) Meaures of the amount of ecologic association between species. Ecology 26:297–302
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge
Fernando S, Stevenson M (2008) A semantic similarity approach to paraphrase detection. Paper presented at the 11th annual research colloqium computational linguistics UK (CLUK 2008)
Freihat AA, Giunchiglia F, Dutta B (2016) A taxonomic classification of WordNet polysemy types. In: 8th Global WordNet conference 2016, Bucharest, Romania, 2016, pp 105–113
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66
Gómez-Pérez A, Fernández-López M, Corcho O (2004) Ontological engineering, 2nd edn. Springer, Berlin
Hadj-Taieb MA, Ben-Aouicha M, Ben-Hamadou A (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41:467–497
Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 49:38–53
Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: International conference on research in computational linguistics, ROCLING X, Taipei, Taiwan, Sept 1997, pp 19–33
Kim S, Fiorini N, Wilbur WJ, Lu Z (2017) Bridging the gap: incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J Biomed Inform 75:122–127
Lastra-Díaz JJ, García-Serrano A (2015a) A new family of information content models with an experiemental survey on WordNet. Knowl-Based Syst 89:509–526
Lastra-Díaz JJ, García-Serrano A (2015b) A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Eng Appl Artif Intell 46:140–153
Lin D (1998) An information-theoretic definition of similarity. In: Shavlik J (ed) 15th international conference on machine learning, ICML 1998, Madison, Wisconsin, USA, 24–27 July 1998. Morgan Kaufmann, pp 296–304
McInnes BT, Pedersen T (2013) Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J Biomed Inform 46:1116–1124
Meng L, Gu J (2012) A new model for measuring word sense similarity in WordNet. In: 4th international conference on advanced communication and networking, Jeju, Korea, 2012, pp 18–23
Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5:81–93
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6:1–28
Palmer M, Dang H, Fellbaum C (2007) Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat Lang Eng 13:137–163
Pirró G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68:1289–1308. https://doi.org/10.1016/j.datak.2009.06.008
Pirrò G, Euzenat J (2010) A feature and information theoretic framework for semantic similarity and relatedness. In: International semantic web conference, 2010, pp 615–630
Rada R, Mili H, Bichnell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 9:17–30. https://doi.org/10.1109/21.24528
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Mellish CS (ed) 14th international joint conference on artificial intelligence, IJCAI 1995, Montreal, Quebec, Canada, 1995. Morgan Kaufmann Publishers Inc., pp 448–453
Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130. https://doi.org/10.1613/jair.514
Rodriguez-Garcia M, Batet M, Sánchez D (2017) A semantic framework for noise addition with nominal data. Knowl-Based Syst 122:103–118
Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8:627–633. https://doi.org/10.1145/365628.365657
Sánchez D, Batet M (2011) Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J Biomed Inform 44:749–759
Sánchez D, Batet M (2012) A new model to compute the information content of concepts from taxonomic knowledge. Int J Semant Web Inf Syst 8:34–50
Sánchez D, Batet M (2017) Toward sensitive document release with privacy guarantees. Eng Appl Artif Intell 59:23–34
Sánchez D, Batet M, Isern D (2011) Ontology-based information content computation. Knowl-based Syst 24:297–303
Sánchez D, Batet M, Isern D, Valls A (2012a) Ontology-based semantic similarity: a new feature-based approach. Expert Syst Appl 39:7718–7728
Sánchez D, Moreno A, Vasto-Terrientes LD (2012b) Learning relation axioms from text: an automatic Web-based approach. Expert Syst Appl 39:5792–5805
Sánchez D, Castellà-Roca J, Viejo A (2013) Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines. Inf Sci 218:17–30
Sebti A, Barfroush AA (2008) A new word sense similarity measure in WordNet. Paper presented at the proceedings of the international multiconference on computer science and information technology, IMCSIT 2008, Wisia, Poland
Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: López de Mántaras R, Saitta L (eds) 16th European conference on artificial intelligence, ECAI 2004, including prestigious applicants of intelligent systems, PAIS 2004, Valencia, Spain, 22–27 Aug 2004. IOS Press, pp 1089–1090
Vicient C, Sánchez D, Moreno A (2013) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell 26:1092–1106
Viejo A, Sánchez D (2016) Enforcing transparent access to private content in social networks by means of automatic sanitization. Expert Syst Appl 62:148–160
Viejo A, Sánchez D, Castellà-Roca J (2012) Preventing automatic user profiling in Web 2.0 applications. Knowl-Based Syst 36:191–205
Wang P, Domeniconi C (2008) Building semantic kernels for text classification using wikipedia. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, Nevada, USA, 2008. ACM, pp 713–721
Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: 32nd annual meeting of the association for computational linguistics, Las Cruces, New Mexico, 1994. Association for Computational Linguistics, pp 133–138
Yuan Q, Yu Z, Wang K (2013) A new model of information content for measuring the semantic similarity between concepts. In: Proceedings of the 2nd international conference on cloud computing and big data, 2013. IEEE Computer Society, pp 141–146
Zhou Z, Wang Y, Gu J (2008) A new model of information content for semantic similarity in WordNet. In: Yau SS, Lee C, Chung Y-C (eds) 2nd international conference on future generation communication and networking symposia, FGCNS 2008, Sanya, Hainan Island, China, 13–15 Dec 2008. IEEE Computer Society, pp 85–89. https://doi.org/10.1109/fgcns.2008.16
Acknowledgements
This work was partly supported by the European Commission (H2020-700540 project “CANVAS”), by the Spanish Government (projects TIN2014-57364-C2-2-R “SmartGlacis”, RTI2018-095094-B-C22 "CONSENT" and TIN2016-80250-R “Sec-MCloud”). The opinions expressed in this paper are those of the authors and do not necessarily reflect the views of UNESCO.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Batet, M., Sánchez, D. Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content. Artif Intell Rev 53, 2023–2041 (2020). https://doi.org/10.1007/s10462-019-09725-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-019-09725-4