Acronyms: identification, expansion and disambiguation

Jacobs, Kayla; Itai, Alon; Wintner, Shuly

doi:10.1007/s10472-018-9608-8

Acronyms: identification, expansion and disambiguation

Published: 06 December 2018

Volume 88, pages 517–532, (2020)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

325 Accesses
10 Citations
Explore all metrics

Abstract

Acronyms—words formed from the initial letters of a phrase—are important for various natural language processing applications, including information retrieval and machine translation. While hand-crafted acronym dictionaries exist, they are limited and require frequent updates. We present a new machine-learning-based approach to automatically build an acronym dictionary from unannotated texts. This is the first such technique that specifically handles non-local acronyms, i.e., that can determine an acronym’s expansion even when the expansion does not appear in the same document as the acronym. Our approach automatically enhances the dictionary with contextual information to help address the acronym disambiguation task (selecting the most appropriate expansion for a given acronym in context), outperforming dictionaries built using prior techniques. We apply the approach to Modern Hebrew, a language with a long tradition of using acronyms, in which the productive morphology and unique orthography adds to the complexity of the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Language Modeling Approach for Acronym Expansion Disambiguation

A Survey on Acronym–Expansion Mining Approaches from Text and Web

A cascaded framework for identification and extraction of antonym for Turkish language

Article 01 August 2018

Tuğba Yıldız & Savaş Yıldırım

References

Ashkenazi, S., Yarden, D.: Treasury of acronyms. Kiryat Sefer, Jerusalem. In Hebrew (1994)
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol 2, 27:1–27:27 (2011)
Article Google Scholar
Dannélls, D.: Acronym recognition: recognizing acronyms in Swedish texts. Master’s Thesis, Department of Linguistics, University of Gothenburg, Gothenburg (2006)
Google Scholar
Dannélls, D.: Automatic acronym recognition. In: Proceedings of the 11th conference of the european chapter of the association for computational linguistics. Trento, Italy, pp. 167–170 (2006)
Dannélls, D.: Acronym classification using feature combinations (2007)
HaCohen-Kerner, Y., Kass, A., Peretz, A.: Baseline methods for automatic disambiguation of abbreviations in Jewish law documents. In: Vicedo, J.L., Martínez-Barco, P., Munoz, R., Noeda, M.S. (eds.) Proceedings of the 4th international conference on advances in natural language, lecture notes in artificial intelligence, vol. 3230, pp. 58–69. Springer, Berlin (2004)
HaCohen-Kerner, Y., Kass, A., Peretz, A.: Abbreviation disambiguation: experiments with various variants of the one sense per discourse hypothesis. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) Lecture Notes in Computer Science, Natural Language and Information Systems, vol. 5039. Springer, pp. 27–39. https://doi.org/10.1007/978-3-540-69858-6_5 (2008)
HaCohen-Kerner, Y., Kass, A., Peretz, A.: Combined one sense disambiguation of abbreviations. In: Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies: short papers, HLT-Short ’08. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 61–64. http://dl.acm.org/citation.cfm?id=1557690.1557707 (2008)
HaCohen-Kerner, Y., Kass, A., Peretz, A.: HAADS: a hebrew aramaic abbreviation disambiguation system. J. Am. Soc. Inf. Sci. Technol. 61(9), 1923–1932 (2010)
Article Google Scholar
HaCohen-Kerner, Y., Kass, A., Peretz, A.: Initialism disambiguation: man versus machine. J. Am. Soc. Inf. Sci. Technol. 64(10), 2133–2148 (2013)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explorations 11(1), 10–18 (2009). https://doi.org/10.1145/1656274.1656278
Article Google Scholar
Israel Defense Forces: Dictionary of abbreviations and acronyms. In Hebrew (2010)
Itai, A., Wintner, S.: Language resources for Hebrew. Lang. Resour. Eval. 42 (1), 75–98 (2008)
Article Google Scholar
Jain, A., Cucerzan, S., Azzam, S.: Acronym-Expansion Recognition and Ranking on the Web. In: Information reuse and integration (IRI 2007). IEEE, pp. 209–214 (2007)
Ji, X., Xu, G., Bailey, J., Li, H.: Mining, ranking, and using acronym patterns. In: Proceedings of the 10th asia-pacific web conference on progress in WWW research and development, APWeb’08, pp. 371–382. Springer, Berlin (2008). http://dl.acm.org/citation.cfm?id=1791734.1791779
Li, C., Ji, L., Yan, J.: Acronym disambiguation using word embedding. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp. 4178–4179. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9404 (2015)
Mair, C.: Twentieth-century english: history variation and standardization. Studies in english language. Cambridge University Press, Cambridge (2009)
Google Scholar
Marwick, L.: Biblical and judaic acronyms. KTAV Publishing House, Brooklyn (1979)
Google Scholar
McCallum, A.: MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu (2002)
Muchnik, M.: Morpho-phonemic characteristics of acronyms in contemporary Hebrew. Hebrew Linguistics 54, 53–66 (2004). In Hebrew
Google Scholar
Nadeau, D., Turney, P.D.: A supervised learning approach to acronym identification. In: Proceedings of the 18th Canadian society conference on advances in artificial intelligence, AI’05, pp. 319–329. Springer, Berlin (2005). https://doi.org/10.1007/11424918_34
Okazaki, N., Ananiadou, S., Tsujii, J.: Building a high-quality sense inventory for improved abbreviation disambiguation. Bioinformatics 26(9), 1246–1253 (2010). https://doi.org/10.1093/bioinformatics/btq129
Article Google Scholar
Park, Y., Byrd, R.J.: Hybrid text mining for finding abbreviations and their definitions. In: Proceedings of the 2001 conference on empirical methods in natural language processing, pp. 126–133 (2001)
Platt, J.C.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel methods - support vector learning. MIT Press. http://research.microsoft.com/∼jplatt/smo.html (1998)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Ravid, D.: Internal structure constraints on new-word formation devices in modern Hebrew. Folia Linguistica 24, 289–348 (1990)
Article Google Scholar
Schwartz, A.S., Hearst, M.A.: A simple algorithm for identifying abbreviation definitions in biomedical texts. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 451–462 (2003)
Spiegel, Y.S.: The use of uncommon abbreviations and acronyms. Yeshurun. In Hebrew (2002)
Stevenson, M., Guo, Y., Al Amri, A., Gaizauskas, R.: Disambiguation of biomedical abbreviations. In: Proceedings of the workshop on current trends in biomedical natural language processing, BioNLP ’09. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 71–79. http://dl.acm.org/citation.cfm?id=1572364.1572374 (2009)
Tadmor, U.: The acronym in Israeli Hebrew. Leshoneinu La’Am 39, 225–257 (1988). In Hebrew
Google Scholar
Xu, J., Huang, Y.: Using SVM to extract acronyms from text. Soft Computing - A Fusion of Foundations, Methodologies and Applications 11, 369–373 (2006). https://doi.org/10.1007/s00500-006-0091-5. http://dl.acm.org/citation.cfm?id=1180624.1180635
Google Scholar
Yi, J., Sundaresan, N.: Mining the web for acronyms using the duality of patterns and relations. In: Proceedings of the 2nd international workshop on web information and data management, WIDM ’99, pp. 48–52. ACM, New York (1999). https://doi.org/10.1145/319759.319782
Zahariev, M.: Efficient acronym-expansion matching for automatic acronym acquisition. In: Proceedings of the international conference on information and knowledge engineering, pp. 32–37 (2003)

Download references

Acknowledgments

The authors are grateful to Ran El-Yaniv, Doug Freud, Assaf Glazer, Shie Mannor, and Shaul Markovitz for their machine learning advice. We thank Rafi Cohen for his help with LDA, Nachum Dershowitz for his historical acronym guidance, Chaim Kutnicki for his efficient coding support, Tomer Ashur and Sela Ferdman for their pre-processing of the Wikipedia corpus, and Josh Wortman for his dictionary assistance. Statistically significant improvements to our math were provided by Nicholas Mader, Breanna Miller, Tony Rieser, Zach Seeskin, and Brandon Willard. Thanks to acronym annotators Yosi Atia, Hannah Fadida, Limor Leibovich, Lior Leibovich, Shachar Maidenbaum, Elisheva Rotman, and Beny Shlevich. This research was supported by THE ISRAEL SCIENCE FOUNDATION (grant No. 1269/07).

Author information

Authors and Affiliations

Computer Science Department, Technion, Haifa, 32000, Israel
Kayla Jacobs & Alon Itai
Department of Computer Science, University of Haifa, Haifa, 31905, Israel
Shuly Wintner

Authors

Kayla Jacobs
View author publications
You can also search for this author in PubMed Google Scholar
Alon Itai
View author publications
You can also search for this author in PubMed Google Scholar
Shuly Wintner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuly Wintner.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jacobs, K., Itai, A. & Wintner, S. Acronyms: identification, expansion and disambiguation. Ann Math Artif Intell 88, 517–532 (2020). https://doi.org/10.1007/s10472-018-9608-8

Download citation

Published: 06 December 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10472-018-9608-8

Keywords

Mathematics Subject Classification (2010)

68T50

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Acronyms: identification, expansion and disambiguation

Abstract

Access this article

Similar content being viewed by others

A Language Modeling Approach for Acronym Expansion Disambiguation

A Survey on Acronym–Expansion Mining Approaches from Text and Web

A cascaded framework for identification and extraction of antonym for Turkish language

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Acronyms: identification, expansion and disambiguation

Abstract

Access this article

Similar content being viewed by others

A Language Modeling Approach for Acronym Expansion Disambiguation

A Survey on Acronym–Expansion Mining Approaches from Text and Web

A cascaded framework for identification and extraction of antonym for Turkish language

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation