Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study

Mennes, Julie; Pedersen, Ted; Lefever, Els

doi:10.1007/s10579-019-09455-7

Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study

Project Notes
Published: 12 April 2019

Volume 53, pages 889–917, (2019)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

327 Accesses
1 Citation
Explore all metrics

Abstract

Cross-disciplinary communication is often impeded by terminological ambiguity. Hence, cross-disciplinary teams would greatly benefit from using a language technology-based tool that allows for the (at least semi-) automated resolution of ambiguous terms. Although no such tool is readily available, an interesting theoretical outline of one does exist. The main obstacle for the concrete realization of this tool is the current lack of an effective method for the automatic detection of the different meanings of ambiguous terms across different disciplinary jargons. In this paper, we set up a pilot study to experimentally assess whether the word sense induction technique of ‘context clustering’, as implemented in the software package ‘SenseClusters’, might be a solution. More specifically, given several sets of sentences coming from a cross-disciplinary corpus containing a specific ambiguous term, we verify whether this technique can classify each sentence in accordance to the meaning of the ambiguous term in that sentence. For the experiments, we first compile a corpus that represents the disciplinary jargons involved in a project on Bone Tissue Engineering. Next, we conduct two series of experiments. The first series focuses on determining appropriate SenseClusters parameter settings using manually selected test data for the ambiguous target terms ‘matrix’ and ‘model’. The second series evaluates the actual performance of SenseClusters using randomly selected test data for an extended set of target terms. We observe that SenseClusters can successfully classify sentences from a cross-disciplinary corpus according to the meaning of the ambiguous term they contain. Hence, we argue that this implementation of context clustering shows potential as a method for the automatic detection of the meanings of ambiguous terms in cross-disciplinary communication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Note that, in this paper, we remain agnostic with respect to the relation between ambiguity and related phenomena like polysemy, fuzziness, vagueness and generality.
There are many general concerns about the use of sense inventory based approaches. For example, there is the difficulty of demarcating the semantic information that should be included in a sense description, and that of distinguishing between closely related senses (Edmonds and Kilgarriff 2002). Relying on sense inventories is particularly problematic in the context of CD projects. Such projects require the compilation of a custom inventory by selecting sense descriptions from existing ‘disciplinary’ sense inventories. Yet, it is unclear how one can ensure that relevant sense descriptions are selected from relevant sense inventories. Moreover, new sense descriptions would need to be developed for terms that are not included in existing inventories.
In this paper, we use the notions ‘word’ and ‘term’ interchangeably, though we use the latter especially when we want to stress that a lexical unit has a meaning.
For more information, go to http://senseclusters.sourceforge.net.
The more specialized a corpus is, the less broad definitions it contains. This means that references to more general or high-level components of the meanings of terms will be scarce, and thus are less likely to be picked up by means of a context clustering technique.
By spanning different disciplines, the corpus becomes highly variegated as one meaning (e.g. ‘having the capacity to cause rotation’) will often be referred to by different terms (e.g. ‘couple’ in kinematics and ‘force’ in kinetics). This poses a challenge for context clustering, as not only term ambiguity is present but also (latent) synonymy.
For more information, go to https://www.mtm.kuleuven.be/prometheus.
The sub-corpora do not perfectly mirror the original texts as the accuracy of the recognition results was only sanity-checked.
The underlying reason is that SenseClusters is based on the distributional hypothesis as mentioned earlier in Section 2. See also Subsection 3.4.
We define ‘best result’ as the highest accuracy.
Because the settings combination of feature type ‘co-occurrences’ and a window size of ‘6’ yielded better results than the settings combination of feature type ‘co-occurrence’ and a window size of ‘3’, we omitted the latter combination of parameter settings in the fourth round of experiments.
https://www.wordclouds.com

References

Agirre, E., & Edmonds, P. (2006). Word sense disambiguation: Algorithms and applications. Berlin: Springer.
Book Google Scholar
Ankeny, R. A., & Leonelli, S. (2011). What’s so special about model organisms? Studies in History and Philosophy of Science Part A, 42(2), 313–323.
Article Google Scholar
Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the chapter of the association for computational linguistics, Baltimore (pp. 238–247). Maryland, USA: ACL.
Benda, L., Poff, L., Tague, C., Palmer, M., Pizzuto, J., Cooper, S., et al. (2002). How to avoid train wrecks when using science in environmental problem solving. BioScience, 52(12), 1127–1139.
Article Google Scholar
Biemann, C. (2006). Chinese whispers: An efficient graph clustering algorithm and its application to natural language processing problems. In Proceedings of the first workshop on graph based methods for natural language processing, New York City (pp. 73–80).
Bracken, L. J., & Oughton, E. A. (2006). ‘What do you mean?’ The importance of language in developing interdisciplinary research. Transactions of the Institute of British Geographers, 31(3), 371–382.
Article Google Scholar
Church, K., & Hanks, P. (1989). Word association norms, mutual information, and lexicography. In Proceedings of the 27th annual conference of the association of computational linguistics, Vancouver, British Columbia (pp. 76–83).
de Boer, Y., de Gier, A., Verschuur, M., & de Wit, B. (2006). Bruggen bouwen. Onderzoekers over hun ervaringen met interdisciplinair onderzoek in Nederland. RMNO, KNAW, NWO & COS. Retrieved from https://www.knaw.nl/shared/resources/actueel/publicaties/pdf/Bruggen_Bouwen_Onderzoekers_over_interdisciplinair_onderzoek_2006.pdf/view.
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., & Harshman, R. (1990). Indexing by latent sematnic analysis. Journal of the American SOciety for Information Science, 41(6), 391–407.
Article Google Scholar
Edmonds, P., & Kilgarriff, A. (2002). Introduction to the special issue on evaluating word sense disambiguation systems. Natural Language Engineering, 8(4), 279–291.
Article Google Scholar
Escudero, G., Màrquez, L., & Rigau, G. (2000). Boosting applied to word sense disambiguation. In R. López de Mántaras & E. Plaza (Eds.), Machine learning: ECML 2000 (pp. 129–141). Berlin: Springer.
Chapter Google Scholar
Francl, M. (2015). Chemical doublespeak. Nature Chemistry, 7(7), 533.
Article Google Scholar
Hall, T. E., & O’Rourke, M. (2014). Responding to communication challenges in transdisciplinary sustainability science. In Huutoniemi, K. & Tapio, P. (Eds.), Transdisciplinary Sustainability Studies (pp. 135–155). Routledge.
Harris, Z. (1954). Distributional structure. Word, 10(23), 146–162.
Article Google Scholar
Harvey, R., & Lund, V. (2007). Biofilms and chronic rhinosinusitis: systematic review of evidence, current concepts and directions for research. Rhinology, 45(1), 3–13.
Google Scholar
Heemskerk, M. (2003). Conceptual models as tools for communication across disciplines. Conservation Ecology, 7(3), ??.
Article Google Scholar
Iacobacci, I., Pilehvar, M., & Navigli, R. (2016). Embeddings for word sense disambiguation: An evaluation study. In Proceedings of the 54th annual meeting of the association for computational linguistics (pp. 897–907). Berlin, Germany: ACL.
Karypis, G. (2002). Cluto-a clustering toolkit. Tech. rep., Minnesota Univ Minneapolis Dept of Computer Science.
Klein, J. T. (1996). Crossing boundaries: Knowledge, disciplinarities, and interdisciplinarities. Charlottesville: University of Virginia Press.
Google Scholar
Lefever, E., Hoste, V., & De Cock, M. (2011). ParaSense or how to use parallel corpora for word sense disambiguation. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies (pp 317–322) Portland, Oregon, USA: Association for Computational Linguistics.
Levy, O., & Goldberg, Y. (2014). Dependency-based word embeddings. In Proceedings of the 54th Annual meeting of the association for computational linguistics (pp. 302–308) Baltimore, Maryland, USA: ACL.
Lutter, C. (2015). Comparative approaches to visions of community. History and Anthropology, 26(1), 129–143.
Article Google Scholar
Macken, L., Lefever, E., & Hoste, V. (2013). Texsis: Bilingual terminology extraction from parallel corpora using chunk-based alignment. Terminology International Journal of Theoretical and Applied Issues in Specialized Communication, 19(1), 1–30.
Google Scholar
Mennes, J. (2018). SenseDisclosure. A new procedure for dealing with problematically ambiguous terms in cross-disciplinary communication. Language Sciences, 69, 57–67.
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. In Proceedings of the international conference on learning representations (ICLR).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing sysems (pp. 3111–3119).
Naiman, R. (1999). A perspective on interdisciplinary science. Ecosystems, 2(4), 292–295.
Article Google Scholar
Nijhout, H., Reed, M., & Ulrich, C. (2008). Mathematical models of folate-mediated one-carbon metabolism. Vitamins & Hormones, 79, 45–82.
Article Google Scholar
O’Rourke, M., & Crowley, S. J. (2013). Philosophical intervention and cross-disciplinary science: The story of the toolbox project. Synthese, 190, 1–18.
Article Google Scholar
Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161–199.
Article Google Scholar
Pedersen, T. (2006). Unsupervised corpus-based methods for WSD. In Word sense disambiguation: Algorithms and applications, Springer, pp 133–166.
Pedersen, T. (2013). Duluth: Word sense induction applied to web page clustering. In Second joint conference on lexical and computational semantics (* SEM), Volume 2: Proceedings of the seventh international workshop on semantic evaluation (SemEval 2013) (vol. 2, pp. 202–206).
Pedersen, T. (2015). Duluth: Word sense discrimination in the service of lexicography. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015) (pp. 438–442).
Pedersen, T., Purandare, A., & Kulkarni, A. (2005). Name discrimination by clustering similar contexts. In Proceedings of the sixth international conference on intelligent text processing and computational linguistics, Mexico City (pp. 220–231).
Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Purandare, A., & Pedersen, T. (2004). Word sense discrimination by clustering contexts in vector and similarity spaces. In Proceedings of the conference on computational natural language learning, Boston, MA (pp. 41–48).
Salton, G. (1971). The SMART retrieval system: Experiments in automatic document processing. Upper Saddle River, NJ: Prentice-Hall.
Google Scholar
Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–123.
Google Scholar
Serre, D. (2010). Matrices: Theory and applications. Graduate texts in mathematics. (2nd ed.). Springer-Verlag New York.
Spârck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–21.
Article Google Scholar
Thompson, J. (2009). Building collective communication competence in interdisciplinary research teams. Journal of Applied Communication Research, 37(3), 278–297.
Article Google Scholar
Turney, P., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188.
Article Google Scholar
Van de Kauter, M., Coorman, G., Lefever, E., Desmet, B., Macken, L., & Hoste, V. (2013). Lets preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal, 3, 103–120.
Google Scholar
Van de Cruys, T., & Apidianaki, M. (2011). Latent semantic word sense induction and disambiguation. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, association for computational linguistics, Portland, Oregon, USA (pp. 1476–1485).
Vick, D. W. (2004). Interdisciplinarity and the discipline of law. Journal of Law and Society, 31(2), 163–193.
Article Google Scholar
Yu, L. C., Wang, J., Lai, K., & Zhang, X. (2017). Refining word embeddings for sentiment analysis. In Empirical methods in natural language processing (EMNLP) (pp. 545–550).

Download references

Acknowledgements

The work presented in this paper was carried out in the context of a PhD fellowship funded by the Research Foundation—Flanders (FWO). We thank Prof. Dr. Liesbet Geris for sharing her cross-disciplinary experiences as the Scientific Coordinator of Prometheus and providing us with the necessary information for the corpus compilation. We also want to thank Prof. Dr. Stephan van der Waart van Gulik for his constructive feedback which helped to improve the paper significantly.

Author information

Authors and Affiliations

Language and Translation Technology Team, Ghent University, Ghent, Belgium
Julie Mennes & Els Lefever
Department of Computer Science, University of Minnesota, Duluth, MN, 55812, USA
Ted Pedersen

Authors

Julie Mennes
View author publications
You can also search for this author in PubMed Google Scholar
Ted Pedersen
View author publications
You can also search for this author in PubMed Google Scholar
Els Lefever
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julie Mennes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mennes, J., Pedersen, T. & Lefever, E. Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study. Lang Resources & Evaluation 53, 889–917 (2019). https://doi.org/10.1007/s10579-019-09455-7

Download citation

Published: 12 April 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10579-019-09455-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approaching terminological ambiguity in cross-disciplinary communication as a word sense induction task: a pilot study

Abstract

Access this article

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation