Contextualized word senses: from attention to compositionality

Pablo Gamallo

doi:10.1515/lingvan-2022-0125

Published by De Gruyter Mouton November 30, 2023

Contextualized word senses: from attention to compositionality

Pablo Gamallo

From the journal Linguistics Vanguard

https://doi.org/10.1515/lingvan-2022-0125

Showing a limited preview of this publication:

Abstract

The neural architectures of language models are becoming increasingly complex, especially that of Transformers, based on the attention mechanism. Although their application to numerous natural language processing tasks has proven to be very fruitful, they continue to be models with little or no interpretability and explainability. One of the tasks for which they are best suited is the encoding of the contextual sense of words using contextualized embeddings. In this paper we propose a transparent, interpretable, and linguistically motivated strategy for encoding the contextual sense of words by modeling semantic compositionality. Particular attention is given to dependency relations and semantic notions such as selection preferences and paradigmatic classes. A partial implementation of the proposed model is carried out and compared with Transformer-based architectures for a given semantic task, namely the similarity calculation of word senses in context. The results obtained show that it is possible to be competitive with linguistically motivated models instead of using the black boxes underlying complex neural architectures.

Keywords: compositional distributional semantics; compositionality; sense contextualization; dependency parsing; attention mechanism

Corresponding author: Pablo Gamallo, Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela, Galiza, Spain, E-mail: pablo.gamallo@usc.gal

Funding source: Consellería de Cultura, Educación e Ordenación Universitaria

Award Identifier / Grant number: ED431G2019/04

Acknowledgements

This research was funded by the project “Nós – Galician in the society and economy of artificial intelligence”, an agreement between Xunta de Galicia and University of Santiago de Compostela; ILENIA, from Spanish Ministry for Economic Affairs and Digital Transformation; LingUMT, grant PID2021-128811OA-I00, MEC; DeepR, grant TED2021-130295B-C31, MEC; Big-eRisk, grant PLEC2021-007662, MEC; and grant ED431G2019/04 from the Galician Ministry of Education, University and Professional Training, and the European Regional Development Fund (ERDF/FEDER program), Groups of Reference: ED431C 2020/21.

Research funding: This work was supported by the Consellería de Cultura, Educación e Ordenación Universitaria (ED431G2019/04).

References

Asher, Nicholas, Tim Van de Cruys, Antoine Bride & Márta Abrusán. 2016. Integrating type theory and distributional semantics: A case study on adjective–noun compositions. Computational Linguistics 42(4). 703–725. https://doi.org/10.1162/COLI_a_00264.Search in Google Scholar

Baroni, Marco. 2013. Composition in distributional semantics. Language and Linguistics Compass 7. 511–522. https://doi.org/10.1111/lnc3.12050.Search in Google Scholar

Baroni, Marco. 2020. Linguistic generalization and compositionality in modern artificial neural networks. Philosophical Transactions of the Royal Society B 375. 1–7. https://doi.org/10.1098/rstb.2019.0307.Search in Google Scholar

Baroni, Marco, Raffaella Bernardi & Roberto Zamparelli. 2014. Frege in space: A program for compositional distributional semantics. Linguistic Issues in Language Technology (LiLT) 9. 241–346. https://doi.org/10.33011/lilt.v9i.1321.Search in Google Scholar

Baroni, Marco & Roberto Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 1183–1193. Cambridge, MA: Association for Computational Linguistics. Available at: https://aclanthology.org/D10-1115.Search in Google Scholar

Bender, Emily M., Timnit Gebru, Angelina McMillan-Major & Shmargaret Shmitchell. 2021. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21), 610–623. New York, NY: Association for Computing Machinery.10.1145/3442188.3445922Search in Google Scholar

Boleda, Gemma, Marco Baroni, The Nghia Pham & Louise McNally. 2013. Intensionality was only alleged: On adjective-noun composition in distributional semantics. In Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long papers, 35–46. Potsdam: Association for Computational Linguistics. Available at: https://aclanthology.org/W13-0104.Search in Google Scholar

Coecke, Bob, Mehrnoosh Sadrzadeh & Stephen Clark. 2010. Mathematical foundations for a compositional distributional model of meaning. Linguistic Analysis 36(1–4). 345–384.Search in Google Scholar

Dankers, Verna, Elia Bruni & Dieuwke Hupkes. 2022. The paradox of the compositionality of natural language: A neural machine translation case study. In Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: Long papers), 4154–4175. Dublin: Association for Computational Linguistics. Available at: https://aclanthology.org/2022.acl-long.286.10.18653/v1/2022.acl-long.286Search in Google Scholar

De-Dios-Flores, Iria & Marcos Garcia. 2022. A computational psycholinguistic evaluation of the syntactic abilities of Galician BERT models at the interface of dependency resolution and training time. Procesamiento del Lenguaje Natural 69. 15–26.Search in Google Scholar

Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, vol. 1, 4171–4186. Minneapolis, MN: Association for Computational Linguistics. Available at: https://www.aclweb.org/anthology/N19-1423.Search in Google Scholar

Ebrahimi, Javid, Daniel Lowd & Dejing Dou. 2018. On adversarial examples for character-level neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics, 653–663. Santa Fe, NM: Association for Computational Linguistics. Available at: https://aclanthology.org/C18-1055.Search in Google Scholar

Emerson, Guy & Ann Copestake. 2016. Functional distributional semantics. In Proceedings of the 1st Workshop on Representation Learning for NLP, 40–52. Berlin: Association for Computational Linguistics. Available at: https://www.aclweb.org/anthology/W16-1605.10.18653/v1/W16-1605Search in Google Scholar

Erk, Katrin & Aurélie Herbelot. 2021. How to marry a star: Probabilistic constraints for meaning in context. In Proceedings of the Society for Computation in Linguistics 2021, 451–453. Association for Computational Linguistics. Available at: https://aclanthology.org/2021.scil-1.55.Search in Google Scholar

Erk, Katrin & Sebastian Padó. 2008. A structured vector space model for word meaning in context. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 897–906. Honolulu, HI: Association for Computational Linguistics. Available at: https://aclanthology.org/D08-1094.10.3115/1613715.1613831Search in Google Scholar

Ettinger, Allyson. 2020. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics 8. 34–48. https://doi.org/10.1162/tacl_a_00298.Search in Google Scholar

Gamallo, Pablo. 2017. The role of syntactic dependencies in compositional distributional semantics. Corpus Linguistics and Linguistic Theory 13(2). 261–289. https://doi.org/10.1515/cllt-2016-0038.Search in Google Scholar

Gamallo, Pablo. 2019. A dependency-based approach to word contextualization using compositional distributional semantics. Language Modelling 7(1). 53–92. https://doi.org/10.15398/jlm.v7i1.201.Search in Google Scholar

Gamallo, Pablo. 2021. Compositional distributional semantics with syntactic dependencies and selectional preferences. Applied Sciences 11(12). 1–13. https://doi.org/10.3390/app11125743.Search in Google Scholar

Gamallo, Pablo, Manuel Prada Corral & Marcos Garcia. 2021. Comparing dependency-based compositional models with contextualized word embedding. In 13th International Conference on Agents and Artificial Intelligence (ICAART-2021). SCITEPRESS – Science and Technology Publications.10.5220/0010391812581265Search in Google Scholar

Gamallo, Pablo, Iria-de-Dios-Flores Marcos Garcia. 2022. Evaluating contextualized vectors from both large language models and compositional strategies. Procesamiento del Lenguaje Natural 69. 153–164.Search in Google Scholar

Grefenstette, Edward & Mehrnoosh Sadrzadeh. 2011. Experimental support for a categorical compositional distributional model of meaning. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 1394–1404. Edinburgh: Association for Computational Linguistics. Available at: https://aclanthology.org/D11-1129.Search in Google Scholar

Grefenstette, Edward, Mehrnoosh Sadrzadeh, Stephen Clark, Bob Coecke & Stephen Pulman. 2011. Concrete sentence spaces for compositional distributional models of meaning. In Proceedings of the ninth International Conference on Computational Semantics (IWCS 2011), 125–134. Oxford. Available at: https://aclanthology.org/W11-0114.Search in Google Scholar

Gupta, Abhijeet, Gemma Boleda, Marco Baroni & Sebastian Padó. 2015. Distributional vectors encode referential attributes. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 12–21. Lisbon: Association for Computational Linguistics. Available at: http://aclweb.org/anthology/D15-1002.10.18653/v1/D15-1002Search in Google Scholar

Hupkes, Dieuwke, Verna Dankers, Mathijs Mul & Elia Bruni. 2020. Compositionality decomposed: How do neural networks generalise? Journal of Artificial Intelligence Research 67. 757–795. https://doi.org/10.1613/jair.1.11674.Search in Google Scholar

Kartsaklis, Dimitri, Nal Kalchbrenner & Mehrnoosh Sadrzadeh. 2014. Resolving lexical ambiguity in tensor regression models of meaning. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (vol. 2: Short papers), 212–217. Baltimore, MD: Association for Computational Linguistics. Available at: https://aclanthology.org/P14-2035.10.3115/v1/P14-2035Search in Google Scholar

Kartsaklis, Dimitri & Mehrnoosh Sadrzadeh. 2013. Prior disambiguation of word tensors for constructing sentence vectors. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 1590–1601. Seattle, WA: Association for Computational Linguistics. Available at: https://aclanthology.org/D13-1166.Search in Google Scholar

Kim, Najoung & Tal Linzen. 2020. COGS: A compositional generalization challenge based on semantic interpretation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 9087–9105. Association for Computational Linguistics. Available at: https://aclanthology.org/2020.emnlp-main.731.10.18653/v1/2020.emnlp-main.731Search in Google Scholar

Krishnamurthy, Jayant & Tom Mitchell. 2013. Vector space semantic parsing: A framework for compositional vector space models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, 1–10. Sofia: Association for Computational Linguistics. Available at: https://aclanthology.org/W13-3201.Search in Google Scholar

Lake, Brenden M., Tomer D. Ullman, Joshua B. Tenenbaum & Samuel J. Gershman. 2017. Building machines that learn and think like people. Behavioral and Brain Sciences 40. 1–72. https://doi.org/10.1017/S0140525X16001837.Search in Google Scholar

Langacker, Ronald W. 1987. Foundations of cognitive grammar, vol. 1: Theoretical Prerequisites. Stanford: Stanford University Press.Search in Google Scholar

Langacker, Ronald W. 1991. Foundations of cognitive grammar, vol. 2: Descriptive Applications. Stanford: Stanford University Press.Search in Google Scholar

Lenci, Alessandro, Magnus Sahlgren, Patrick Jeuniaux, Amaru Cuba Gyllensten & Martina Miliani. 2022. A comparative evaluation and analysis of three generations of distributional semantic models. Language Resources and Evaluation 56. 1269–1313. https://doi.org/10.1007/s10579-021-09575-z.Search in Google Scholar

Linzen, Tal. 2019. What can linguistics and deep learning contribute to each other? Response to Pater. Language 95(1). 99–108. https://doi.org/10.1353/lan.2019.0001.Search in Google Scholar

Linzen, Tal & Brian Leonard. 2018. Distinct patterns of syntactic agreement errors in recurrent networks and humans. In Proceedings of the 40th Annual Conference of the Cognitive Science Society, 690–695. Madison, WI: Cognitive Science Society.Search in Google Scholar

Marcus, Gary. 2003. The algebraic mind: Integrating connectionism and cognitive science. Cambridge, MA: MIT Press.Search in Google Scholar

Marcus, Gary. 2018. Deep learning: A critical appraisal. CoRR abs/1801.00631. 1–27. https://doi.org/10.48550/arXiv.1801.00631.Search in Google Scholar

Marcus, Gary & Ernest Davis. 2019. Rebooting AI: Building artificial intelligence we can trust. New York: Pantheon Books.Search in Google Scholar

McNally, Louise. 2017. Kinds, descriptions of kinds, concepts, and distributions. In Kata Balogh & Wiebke Petersen (eds.), Bridging formal and conceptual semantics: Selected papers of BRIDGE-14, 39–61. Düsseldorf: DUPress.Search in Google Scholar

Mitchell, Jeff & Mirella Lapata. 2008. Vector-based models of semantic composition. In Proceedings of the Association for Computational Linguistics: Human Language Technologies (ACL-08: HLT), 236–244. Columbus, OH: Association for Computational Linguistics. Available at: https://aclanthology.org/P08-1028.Search in Google Scholar

Mitchell, Jeff & Mirella Lapata. 2009. Language models based on semantic composition. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP-2009), 430–439. Singapore. Available at: https://aclanthology.org/D09-1045.pdf.10.3115/1699510.1699567Search in Google Scholar

Mitchell, Jeff & Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science 34(8). 1388–1439. https://doi.org/10.1111/j.1551-6709.2010.01106.x.Search in Google Scholar

Montague, Richard. 1970. Universal grammar. Theoria 36(3). 373–398. https://doi.org/10.1111/j.1755-2567.1970.tb00434.x.Search in Google Scholar

Nefdt, Ryan M. 2020. A puzzle concerning compositionality in machines. Minds and Machines 30(1). 47–75. https://doi.org/10.1007/s11023-020-09519-6.Search in Google Scholar

Pandia, Lalchand & Allyson Ettinger. 2021. Sorting through the noise: Testing robustness of information processing in pre-trained language models. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia & Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, 1583–1596. Association for Computational Linguistics.10.18653/v1/2021.emnlp-main.119Search in Google Scholar

Partee, Barbara H. 2004. Compositionality in formal semantics. Oxford: Wiley-Blackwell.10.1002/9780470751305Search in Google Scholar

Pustejovsky, James. 1995. The generative lexicon. Cambridge, MA: MIT Press.Search in Google Scholar

Steedman, Mark. 1996. Surface structure and interpretation. Cambridge, MA: MIT Press.Search in Google Scholar

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser & Illia Polosukhin. 2017. Attention is all you need. In Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna Wallach, Rob Fergus, S. Vishwanathan & Roman Garnett (eds.), Advances in neural information processing systems, vol. 30, 5998–6008. Long Beach, CA: Curran Associates. Available at: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.Search in Google Scholar

Wang, Rui, Wei Liu & Chris McDonald. 2017. A matrix-vector recurrent unit model for capturing compositional semantics in phrase embeddings. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management, cikm ’17, 1499–1507. New York, NY: Association for Computing Machinery.10.1145/3132847.3132984Search in Google Scholar

Warstadt, Alex & Samuel R. Bowman. 2020. Can neural networks acquire a structural bias from raw data? In Proceedings of the Annual Meeting of the Cognitive Science Society, CogSci 2020, 1737–1743. Available at: https://cognitivesciencesociety.org/cogsci-2020/.Search in Google Scholar

Weir, David J., Julie Weeds, Jeremy Reffin & Thomas Kober. 2016. Aligning packed dependency trees: A theory of composition for distributional semantics. Computational Linguistics 42(4). 727–761. https://doi.org/10.1162/coli_a_00265.Search in Google Scholar

Wijnholds, Gijs, Mehrnoosh Sadrzadeh & Stephen Clark. 2020. Representation learning for type-driven composition. In Proceedings of the 24th Conference on Computational Natural Language Learning, 313–324. Association for Computational Linguistics. Available at: https://www.aclweb.org/anthology/2020.conll-1.24.10.18653/v1/2020.conll-1.24Search in Google Scholar

Received: 2022-10-19

Accepted: 2023-04-12

Published Online: 2023-11-30

Contextualized word senses: from attention to compositionality

Abstract

Acknowledgements

References

Journal and Issue

Articles in the same Issue