-
LLM–Assisted Data Augmentation for Chinese Dialogue–Level Dependency Parsing Comput. Linguist. (IF 9.3) Pub Date : 2024-03-12 Meishan Zhang, Gongyao Jiang, Shuang Liu, Jing Chen, Min Zhang
Dialogue–level dependency parsing, despite its growing academic interest, often encounters underperformance issues due to resource shortages. A potential solution to this challenge is data augmentation. In recent years, large language models (LLMs) have demonstrated strong capabilities in generation which can facilitate data augmentation greatly. In this study, we focus on Chinese dialogue–level dependency
-
A Systematic Review of Computational Approaches to Deciphering Bronze Age Aegean and Cypriot Scripts Comput. Linguist. (IF 9.3) Pub Date : 2024-03-08 Maja Braović, Damir Krstinić, Maja Štula, Antonia Ivanda
This paper provides a detailed insight into computational approaches for deciphering Bronze Age Aegean and Cypriot scripts, namely the Archanes script and the Archanes formula, Phaistos Disk, Cretan hieroglyphic (including the Malia Altar Stone and Arkalochori Axe), Linear A, Linear B, Cypro-Minoan and Cypriot scripts. The unique contributions of this paper are threefold: 1) a thorough review of major
-
A Novel Alignment-based Approach for PARSEVAL Measures Comput. Linguist. (IF 9.3) Pub Date : 2024-03-04 Eunkyul Leah Jo, Angela Yoonseo Park, Jungyeul Park
We propose a novel method for calculating PARSEVAL measures to evaluate constituent parsing results. Previous constituent parsing evaluation techniques were constrained by the requirement for consistent sentence boundaries and tokenization results, proving to be stringent and inconvenient. Our new approach handles constituent parsing results obtained from raw text, even when sentence boundaries and
-
Towards Faithful Model Explanation in NLP: A Survey Comput. Linguist. (IF 9.3) Pub Date : 2024-01-22 Qing Lyu, Marianna Apidianaki, Chris Callison-Burch
End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent years. One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model’s prediction. In this survey, we review over 110 model explanation methods
-
The Pitfalls of Defining Hallucination Comput. Linguist. (IF 9.3) Pub Date : 2024-01-19 Kees van Deemter
Despite impressive advances in Natural Language Generation (NLG) and Large Language Models (LLMs), researchers are still unclear about important aspects of NLG evaluation. To substantiate this claim, I examine current classifications of hallucination and omission in Datatext NLG, and I propose a logic-based synthesis of these classfications. I conclude by highlighting some remaining limitations of
-
Context-aware Transliteration of Romanized South Asian Languages Comput. Linguist. (IF 9.3) Pub Date : 2024-01-19 Christo Kirov, Cibu Johny, Anna Katanova, Alexander Gutkin, Brian Roark
While most transliteration research is focused on single tokens such as named entities – e.g., transliteration of “અમદાવાદ” from the Gujarati script to the Latin script “Ahmedabad” – the informal romanization prevalent in South Asia and elsewhere often requires transliteration of full sentences. The lack of large parallel text collections of full sentence (as opposed to single word) transliterations
-
Common Flaws in Running Human Evaluation Experiments in NLP Comput. Linguist. (IF 9.3) Pub Date : 2024-01-08 Craig Thomson, Ehud Reiter, Anya Belz
While conducting a coordinated set of repeat runs of human evaluation experiments in NLP, we discovered flaws in every single experiment we selected for inclusion via a systematic process. In this paper, we describe the types of flaws we discovered which include coding errors (e.g., loading the wrong system outputs to evaluate), failure to follow standard scientific practice (e.g., ad hoc exclusion
-
Topics in the Haystack: Enhancing Topic Quality through Corpus Expansion Comput. Linguist. (IF 9.3) Pub Date : 2024-01-08 Anton Thielmann, Arik Reuter, Quentin Seifert, Elisabeth Bergherr, Benjamin Säfken
Extracting and identifying latent topics in large text corpora has gained increasing importance in Natural Language Processing (NLP). Most models, whether probabilistic models similar to Latent Dirichlet Allocation (LDA) or neural topic models, follow the same underlying approach of topic interpretability and topic extraction. We propose a method that incorporates a deeper understanding of both sentence
-
A Bayesian approach to uncertainty in word embedding bias estimation Comput. Linguist. (IF 9.3) Pub Date : 2024-01-08 Alicja Dobrzeniecka, Rafal Urbaniak
Multiple measures, such as WEAT or MAC, attempt to quantify the magnitude of bias present in word embeddings in terms of a single-number metric. However, such metrics and the related statistical significance calculations rely on treating pre-averaged data as individual data points and employing bootstrapping techniques with low sample sizes. We show that similar results can be easily obtained using
-
UG-schematic Annotation for Event Nominals: A Case Study in Mandarin Chinese Comput. Linguist. (IF 9.3) Pub Date : 2023-12-19 Wenxi Li, Guy Emerson, Yutong Zhang, Weiwei Sun
Divergence of languages observed at the surface level is a major challenge encountered by multilingual data representation, especially when typologically distant languages are involved. Drawing inspirations from a formalist Chomskyan perspective towards language universals, Universal Grammar (UG), this article employs deductively pre-defined universals to analyse a multilingually heterogeneous phenomenon
-
Assessing the Cross-linguistic Utility of Abstract Meaning Representation Comput. Linguist. (IF 9.3) Pub Date : 2023-12-19 Shira Wein, Nathan Schneider
Semantic representations capture the meaning of a text. Meaning Representation (AMR), a type of semantic representation, focuses on predicate-argument structure and abstracts away from surface form. Though AMR was developed initially for English, it has now been adapted to a multitude of languages in the form of non-English annotation schemas, cross-lingual text-to-AMR parsing, and AMR-to-(non-English)
-
Stance Detection with Explanations Comput. Linguist. (IF 9.3) Pub Date : 2023-12-12 Rudra Ranajee Saha, Raymond T. Ng, Laks V. S. Lakshmanan
Identification of stance has recently gained a lot of attention with the extreme growth of fake news and filter bubbles. Over the last decade, many feature-based and deep-learning approaches have been proposed to solve Stance Detection. However, almost none of the existing works focus on providing a meaningful explanation for their prediction. In this work, we study Stance Detection with an emphasis
-
Polysemy - Evidence from Linguistics, Behavioural Science and Contextualised Language Models Comput. Linguist. (IF 9.3) Pub Date : 2023-12-12 Janosch Haber, Massimo Poesio
Polysemy is the type of lexical ambiguity where a word has multiple distinct but related interpretations. In the past decade, it has been the subject of a great many studies across multiple disciplines including linguistics, psychology, neuroscience, and computational linguistics, which have made it increasingly clear that the complexity of polysemy precludes simple, universal answers, especially concerning
-
Can Large Language Models Transform Computational Social Science? Comput. Linguist. (IF 9.3) Pub Date : 2023-12-12 Caleb Ziems, Omar Shaikh, Zhehao Zhang, William Held, Jiaao Chen, Diyi Yang
Large Language Models (LLMs) are capable of successfully performing many language processing tasks zero-shot (without training data). If zero-shot LLMs can also reliably classify and explain social phenomena like persuasiveness and political ideology, then LLMs could augment the Computational Social Science (CSS) pipeline in important ways. This work provides a road map for using LLMs as CSS tools
-
My Tenure as the Editor-in-Chief of Computational Linguistics Comput. Linguist. (IF 9.3) Pub Date : 2023-12-01 Hwee Tou Ng
Times flies and it has been close to five and a half years since I became the editor-in-chief of Computational Linguistics on 15 July 2018. In this editorial, I will describe the changes that I have introduced at the journal, and highlight the achievements and challenges of the journal.
-
Languages Through the Looking Glass of BPE Compression Comput. Linguist. (IF 9.3) Pub Date : 2023-12-01 Ximena Gutierrez-Vasques, Christian Bentz, Tanja Samardžić
Byte-pair encoding (BPE) is widely used in NLP for performing subword tokenization. It uncovers redundant patterns for compressing the data, and hence alleviates the sparsity problem in downstream applications. Subwords discovered during the first merge operations tend to have the most substantial impact on the compression of texts. However, the structural underpinnings of this effect have not been
-
Measuring Attribution in Natural Language Generation Models Comput. Linguist. (IF 9.3) Pub Date : 2023-12-01 Hannah Rashkin, Vitaly Nikolaev, Matthew Lamm, Lora Aroyo, Michael Collins, Dipanjan Das, Slav Petrov, Gaurav Singh Tomar, Iulia Turc, David Reitter
Large neural models have brought a new challenge to natural language generation (NLG): It has become imperative to ensure the safety and reliability of the output of models that generate freely. To this end, we present an evaluation framework, Attributable to Identified Sources (AIS), stipulating that NLG output pertaining to the external world is to be verified against an independent, provided source
-
Capturing Fine-Grained Regional Differences in Language Use through Voting Precinct Embeddings Comput. Linguist. (IF 9.3) Pub Date : 2023-12-01 Alex Rosenfeld, Lars Hinrichs
Linguistic variation across a region of interest can be captured by partitioning the region into areas and using social media data to train embeddings that represent language use in those areas. Recent work has focused on larger areas, such as cities or counties, to ensure that enough social media data is available in each area, but larger areas have a limited ability to find fine-grained distinctions
-
The Role of Typological Feature Prediction in NLP and Linguistics Comput. Linguist. (IF 9.3) Pub Date : 2023-11-20 Johannes Bjerva
Computational typology has gained traction in the field of Natural Language Processing (NLP) in recent years, as evidenced by the increasing number of papers on the topic and the establishment of a Special Interest Group on the topic (SIGTYP), including the organization of successful workshops and shared tasks. A considerable amount of work in this sub-field is concerned with prediction of typological
-
How is a “Kitchen Chair” like a “Farm Horse”? Exploring the Representation of Noun-Noun Compound Semantics in Transformer-based Language Models Comput. Linguist. (IF 9.3) Pub Date : 2023-11-15 Mark Ormerod, Barry Devereux, Jesús Martínez del Rincón
Despite the success of Transformer-based language models in a wide variety of natural language processing tasks, our understanding of how these models process a given input in order to represent task-relevant information remains incomplete. In this work, we focus on semantic composition and examine how Transformer-based language models represent semantic information related to the meaning of English
-
On the Role of Morphological Information for Contextual Lemmatization Comput. Linguist. (IF 9.3) Pub Date : 2023-11-15 Olia Toporkov, Rodrigo Agerri
Lemmatization is a natural language processing (NLP) task which consists of producing, from a given inflected word, its canonical form or lemma. Lemmatization is one of the basic tasks that facilitate downstream NLP applications, and is of particular importance for high-inflected languages. Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic
-
Universal Generation for Optimality Theory Is PSPACE-Complete Comput. Linguist. (IF 9.3) Pub Date : 2023-11-15 Sophie Hao
This paper shows that the universal generation problem (Heinz, Kobele, and Riggle 2009) for Optimality Theory (OT, Prince and Smolensky 1993, 2004) is PSPACE-complete. While prior work has shown that universal generation is at least NP-hard (Eisner 1997, 2000b; Wareham 1998; Idsardi 2006) and at most EXPSPACE-hard (Riggle 2004), our results place universal generation in between those two classes, assuming
-
Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation Comput. Linguist. (IF 9.3) Pub Date : 2023-11-15 Jianhui Pang, Derek Fai Wong, Dayiheng Liu, Jun Xie, Baosong Yang, Yu Wan, Lidia Sam Chao
The utilization of monolingual data has been shown to be a promising strategy for addressing low-resource machine translation problems. Previous studies have demonstrated the effectiveness of techniques such as Back-Translation and self-supervised objectives, including Masked Language Modeling, Causal Language Modeling, and Denoise Autoencoding, in improving the performance of machine translation models
-
Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering Comput. Linguist. (IF 9.3) Pub Date : 2023-11-15 Akshay Chaturvedi, Soumadeep Saha, Nicholas Asher, Swarnadeep Bhar, Utpal Garain
Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should
-
Language Model Behavior: A Comprehensive Survey Comput. Linguist. (IF 9.3) Pub Date : 2023-11-15 Tyler A. Chang, Benjamin K. Bergen
Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive
-
Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model Comput. Linguist. (IF 9.3) Pub Date : 2023-09-01 Chris van der Lee, Thiago Castro Ferreira, Chris Emmery, Travis J. Wiltshire, Emiel Krahmer
This study discusses the effect of semi-supervised learning in combination with pretrained language models for data-to-text generation. It is not known whether semi-supervised learning is still helpful when a large-scale language model is also supplemented. This study aims to answer this question by comparing a data-to-text system only supplemented with a language model, to two data-to-text systems
-
Dimensions of Explanatory Value in NLP Models Comput. Linguist. (IF 9.3) Pub Date : 2023-09-01 Kees van Deemter
Performance on a dataset is often regarded as the key criterion for assessing NLP models. I argue for a broader perspective, which emphasizes scientific explanation. I draw on a long tradition in the philosophy of science, and on the Bayesian approach to assessing scientific theories, to argue for a plurality of criteria for assessing NLP models. To illustrate these ideas, I compare some recent models
-
Grammatical Error Correction: A Survey of the State of the Art Comput. Linguist. (IF 9.3) Pub Date : 2023-09-01 Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated
-
Comparing Selective Masking Methods for Depression Detection in Social Media Comput. Linguist. (IF 9.3) Pub Date : 2023-09-01 Chanapa Pananookooln, Jakrapop Akaranee, Chaklam Silpasuwanchai
Identifying those at risk for depression is a crucial issue and social media provides an excellent platform for examining the linguistic patterns of depressed individuals. A significant challenge in depression classification problems is ensuring that prediction models are not overly dependent on topic keywords (i.e., depression keywords) such that it fails to predict when such keywords are unavailable
-
Measuring Attribution in Natural Language Generation Models Comput. Linguist. (IF 9.3) Pub Date : 2023-07-07 Hannah Rashkin, Vitaly Nikolaev, Matthew Lamm, Lora Aroyo, Michael Collins, Dipanjan Das, Slav Petrov, Gaurav Singh Tomar, Iulia Turc, David Reitter
Large neural models have brought a new challenge to natural language generation (NLG): it has become imperative to ensure the safety and reliability of the output of models that generate freely. To this end, we present an evaluation framework, Attributable to Identified Sources (AIS), stipulating that NLG output pertaining to the external world is to be verified against an independent, provided source
-
Language Embeddings Sometimes Contain Typological Generalizations Comput. Linguist. (IF 9.3) Pub Date : 2023-07-07 Robert Östling, Murathan Kurfalı
To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1,295 languages. The learned language representations are then compared to existing typological databases
-
Generation and Polynomial Parsing of Graph Languages with Non-Structural Reentrancies Comput. Linguist. (IF 9.3) Pub Date : 2023-07-07 Johanna Björklund, Frank Drewes, Anna Jonsson
Graph-based semantic representations are popular in natural language processing (NLP), where it is often convenient to model linguistic concepts as nodes and relations as edges between them. Several attempts have been made to find a generative device that is sufficiently powerful to describe languages of semantic graphs, while at the same allowing efficient parsing. We contribute to this line of work
-
The Analysis of Synonymy and Antonymy in Discourse Relations: An Interpretable Modeling Approach Comput. Linguist. (IF 9.3) Pub Date : 2023-06-01 Asela Reig Alamillo, David Torres Moreno, Eliseo Morales González, Mauricio Toledo Acosta, Antoine Taroni, Jorge Hermosillo Valadez
The idea that discourse relations are interpreted both by explicit content and by shared knowledge between producer and interpreter is pervasive in discourse and linguistic studies. How much weight should be ascribed in this process to the lexical semantics of the arguments is, however, uncertain. We propose a computational approach to analyze contrast and concession relations in the PDTB corpus. Our
-
Reflection of Demographic Background on Word Usage Comput. Linguist. (IF 9.3) Pub Date : 2023-06-01 Aparna Garimella, Carmen Banea, Rada Mihalcea
The availability of personal writings in electronic format provides researchers in the fields of linguistics, psychology, and computational linguistics with an unprecedented chance to study, on a large scale, the relationship between language use and the demographic background of writers, allowing us to better understand people across different demographics. In this article, we analyze the relation
-
Gradual Modifications and Abrupt Replacements: Two Stochastic Lexical Ingredients of Language Evolution Comput. Linguist. (IF 9.3) Pub Date : 2023-06-01 Michele Pasquini, Maurizio Serva, Davide Vergni
The evolution of the vocabulary of a language is characterized by two different random processes: abrupt lexical replacements, when a complete new word emerges to represent a given concept (which was at the basis of the Swadesh foundation of glottochronology in the 1950s), and gradual lexical modifications that progressively alter words over the centuries, considered here in detail for the first time
-
From Word Types to Tokens and Back: A Survey of Approaches to Word Meaning Representation and Interpretation Comput. Linguist. (IF 9.3) Pub Date : 2023-06-01 Marianna Apidianaki
Vector-based word representation paradigms situate lexical meaning at different levels of abstraction. Distributional and static embedding models generate a single vector per word type, which is an aggregate across the instances of the word in a corpus. Contextual language models, on the contrary, directly capture the meaning of individual word instances. The goal of this survey is to provide an overview
-
Cross-Lingual Transfer with Language-Specific Subnetworks for Low-Resource Dependency Parsing Comput. Linguist. (IF 9.3) Pub Date : 2023-05-25 Rochelle Choenni, Dan Garrette, Ekaterina Shutova
Large multilingual language models typically share their parameters across all languages, which enables cross-lingual task transfer, but learning can also be hindered when training updates from different languages are in conflict. In this article, we propose novel methods for using language-specific subnetworks, which control cross-lingual parameter sharing, to reduce conflicts and increase positive
-
Machine Learning for Ancient Languages: A Survey Comput. Linguist. (IF 9.3) Pub Date : 2023-05-25 Thea Sommerschield, Yannis Assael, John Pavlopoulos, Vanessa Stefanak, Andrew Senior, Chris Dyer, John Bodel, Jonathan Prag, Ion Androutsopoulos, Nando de Freitas
Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in
-
Dimensional Modeling of Emotions in Text with Appraisal Theories: Corpus Creation, Annotation Reliability, and Prediction Comput. Linguist. (IF 9.3) Pub Date : 2023-03-01 Enrica Troiano, Laura Oberländer, Roman Klinger
The most prominent tasks in emotion analysis are to assign emotions to texts and to understand how emotions manifest in language. An important observation for natural language processing is that emotions can be communicated implicitly by referring to events alone, appealing to an empathetic, intersubjective understanding of events, even without explicitly mentioning an emotion name. In psychology,
-
Transformers and the Representation of Biomedical Background Knowledge Comput. Linguist. (IF 9.3) Pub Date : 2023-03-01 Oskar Wysocki, Zili Zhou, Paul O’Regan, Deborah Ferreira, Magdalena Wysocka, Dónal Landers, André Freitas
Specialized transformers-based models (such as BioBERT and BioMegatron) are adapted for the biomedical domain based on publicly available biomedical corpora. As such, they have the potential to encode large-scale biological knowledge. We investigate the encoding and representation of biological knowledge in these models, and its potential utility to support inference in cancer precision medicine—namely
-
Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future Comput. Linguist. (IF 9.3) Pub Date : 2023-03-01 Jan-Christoph Klie, Bonnie Webber, Iryna Gurevych
Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising number of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have
-
Curing the SICK and Other NLI Maladies Comput. Linguist. (IF 9.3) Pub Date : 2023-03-01 Aikaterini-Lida Kalouli, Hai Hu, Alexander F. Webb, Lawrence S. Moss, Valeria de Paiva
Against the backdrop of the ever-improving Natural Language Inference (NLI) models, recent efforts have focused on the suitability of the current NLI datasets and on the feasibility of the NLI task as it is currently approached. Many of the recent studies have exposed the inherent human disagreements of the inference task and have proposed a shift from categorical labels to human subjective probability
-
Pretrained Transformers for Text Ranking: BERT and Beyond Comput. Linguist. (IF 9.3) Pub Date : 2023-03-01 Suzan Verberne
Pretrained Transformers for Text Ranking: BERT and Beyond by LinJimmy, NogueiraRodrigo, and YatesAndrew (University of Waterloo, University of Campinas, University of Amsterdam) Morgan & Claypool (Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst, volume 53), 2021, 325 pp; ISBN paperback: 9781636392288; ISBN ebook: 9781636392295; doi:10.2200/S01123ED1V01Y202108HLT053
-
Finite-State Text Processing Comput. Linguist. (IF 9.3) Pub Date : 2023-03-01 Aniello De Santo
Finite-State Text Processing by GormanKyle and SproatRichard (Graduate Center, City University of New York & Google LLC). Morgan & Claypool (Synthesis Lectures on Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst, volume 50), 2021, xvii+140 pp; paperback, ISBN: 9781636391137; ebook, ISBN: 9781636391144; hardcover, ISBN: 9781636391151, doi: 10.2200/S01086ED1V01Y202104HLT050
-
Certified Robustness to Text Adversarial Attacks by Randomized [MASK] Comput. Linguist. (IF 9.3) Pub Date : 2023-02-02 Jiehang Zeng, Jianhan Xu, Xiaoqing Zheng, Xuanjing Huang
Very recently, few certified defense methods have been developed to provably guarantee the robustness of a text classifier to adversarial synonym substitutions. However, all the existing certified defense methods assume that the defenders have been informed of how the adversaries generate synonyms, which is not a realistic scenario. In this study, we propose a certifiably robust defense method by randomly
-
Data-driven Cross-lingual Syntax: An Agreement Study with Massively Multilingual Models Comput. Linguist. (IF 9.3) Pub Date : 2023-01-13 Andrea Gregor de Varda, Marco Marelli
Massively multilingual models such as mBERT and XLM-R are increasingly valued in Natural Language Processing research and applications, due to their ability to tackle the uneven distribution of resources available for different languages. The models’ ability to process multiple languages relying on a shared set of parameters raises the question of whether the grammatical knowledge they extracted during
-
Onception: Active Learning with Expert Advice for Real World Machine Translation Comput. Linguist. (IF 9.3) Pub Date : 2022-12-15 Vânia Mendonça, Ricardo Rei, Luísa Coheur, Alberto Sardinha
Active learning can play an important role in low-resource settings (i.e., where annotated data is scarce), by selecting which instances may be more worthy to annotate. Most active learning approaches for Machine Translation assume the existence of a pool of sentences in a source language, and rely on human annotators to provide translations or post-edits, which can still be costly. In this article
-
Erratum: Annotation Curricula to Implicitly Train Non-Expert Annotators Comput. Linguist. (IF 9.3) Pub Date : 2022-12-14 Ji-Ung Lee, Jan-Christoph Klie, Iryna Gurevych
The authors of this work (“Annotation Curricula to Implicitly Train Non-Expert Annotators” by Ji-Ung Lee, Jan-Christoph Klie, and Iryna Gurevych in Computational Linguistics 48:2 https://doi.org/10.1162/coli_a_00436) discovered an incorrect inequality symbol in section 5.3 (page 360). The paper stated that the differences in the annotation times for the control instances result in a p-value of 0.200
-
Information Theory–based Compositional Distributional Semantics Comput. Linguist. (IF 9.3) Pub Date : 2022-12-01 Enrique Amigó, Alejandro Ariza-Casabona, Victor Fresno, M. Antònia Martí
In the context of text representation, Compositional Distributional Semantics models aim to fuse the Distributional Hypothesis and the Principle of Compositionality. Text embedding is based on co-ocurrence distributions and the representations are in turn combined by compositional functions taking into account the text structure. However, the theoretical basis of compositional functions is still an
-
Noun2Verb: Probabilistic Frame Semantics for Word Class Conversion Comput. Linguist. (IF 9.3) Pub Date : 2022-11-28 Lei Yu, Yang Xu
Humans can flexibly extend word usages across different grammatical classes, a phenomenon known as word class conversion. Noun-to-verb conversion, or denominal verb (e.g., to Google a cheap flight), is one of the most prevalent forms of word class conversion. However, existing natural language processing systems are impoverished in interpreting and generating novel denominal verb usages. Previous work
-
A Metrological Perspective on Reproducibility in NLP * Comput. Linguist. (IF 9.3) Pub Date : 2022-11-28 Anya Belz
Reproducibility has become an increasingly debated topic in NLP and ML over recent years, but so far, no commonly accepted definitions of even basic terms or concepts have emerged. The range of different definitions proposed within NLP/ML not only do not agree with each other, they are also not aligned with standard scientific definitions. This article examines the standard definitions of repeatability
-
Enhancing Lifelong Language Learning by Improving Pseudo-Sample Generation Comput. Linguist. (IF 9.3) Pub Date : 2022-11-28 Kasidis Kanwatchara, Thanapapas Horsuwan, Piyawat Lertvittayakumjorn, Boonserm Kijsirikul, Peerapon Vateekul
To achieve lifelong language learning, pseudo-rehearsal methods leverage samples generated from a language model to refresh the knowledge of previously learned tasks. Without proper controls, however, these methods could fail to retain the knowledge of complex tasks with longer texts since most of the generated samples are low in quality. To overcome the problem, we propose three specific contributions
-
Nucleus Composition in Transition-based Dependency Parsing Comput. Linguist. (IF 9.3) Pub Date : 2022-11-28 Joakim Nivre, Ali Basirat, Luise Durlich, Adam Moss
Dependency-based approaches to syntactic analysis assume that syntactic structure can be analyzed in terms of binary asymmetric dependency relations holding between elementary syntactic units. Computational models for dependency parsing almost universally assume that an elementary syntactic unit is a word, while the influential theory of Lucien Tesnière instead posits a more abstract notion of nucleus
-
Effective Approaches to Neural Query Language Identification Comput. Linguist. (IF 9.3) Pub Date : 2022-11-28 Xingzhang Ren, Baosong Yang, Dayiheng Liu, Haibo Zhang, Xiaoyu Lv, Liang Yao, Jun Xie
Query language identification (Q-LID) plays a crucial role in a cross-lingual search engine. There exist two main challenges in Q-LID: (1) insufficient contextual information in queries for disambiguation; and (2) the lack of query-style training examples for low-resource languages. In this article, we propose a neural Q-LID model by alleviating the above problems from both model architecture and data
-
Neural Embedding Allocation: Distributed Representations of Topic Models Comput. Linguist. (IF 9.3) Pub Date : 2022-11-28 Kamrun Naher Keya, Yannis Papanikolaou, James R. Foulds
We propose a method that uses neural embeddings to improve the performance of any given LDA-style topic model. Our method, called neural embedding allocation (NEA), deconstructs topic models (LDA or otherwise) into interpretable vector-space embeddings of words, topics, documents, authors, and so on, by learning neural embeddings to mimic the topic model. We demonstrate that NEA improves coherence
-
The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization Comput. Linguist. (IF 9.3) Pub Date : 2022-11-28 Ildiko Pilan, Pierre Lison, Lilja Ovrelid, Anthi Papadopoulou, David Sanchez, Montserrat Batet
We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods. Text anonymization, defined as the task of editing a text document to prevent the disclosure of personal information, currently suffers from a shortage of privacy-oriented annotated text resources, making it difficult to properly evaluate the level of privacy protection offered
-
Information Theory–based Compositional Distributional Semantics Comput. Linguist. (IF 9.3) Pub Date : 2022-12-02 Enrique Amigo, Alejandro Ariza-Casabona, Victor Fresno, M. Antonia Marti
In the context of text representation, Compositional Distributional Semantics models aim to fuse the Distributional Hypothesis and the Principle of Compositionality. Text embedding is based on co-ocurrence distributions and the representations are in turn combined by compositional functions taking into account the text structure. However, the theoretical basis of compositional functions is still an
-
It Takes Two Flints to Make a Fire: Multitask Learning of Neural Relation and Explanation Classifiers Comput. Linguist. (IF 9.3) Pub Date : 2022-09-23 Zheng Tang, Mihai Surdeanu
We propose an explainable approach for relation extraction that mitigates the tension between generalization and explainability by jointly training for the two goals. Our approach uses a multi-task learning architecture, which jointly trains a classifier for relation extraction, and a sequence model that labels words in the context of the relation that explain the decisions of the relation classifier
-
Tractable Parsing for CCGs of Bounded Degree Comput. Linguist. (IF 9.3) Pub Date : 2022-09-01 Lena Katharina Schiffer, Marco Kuhlmann, Giorgio Satta
Unlike other mildly context-sensitive formalisms, Combinatory Categorial Grammar (CCG) cannot be parsed in polynomial time when the size of the grammar is taken into account. Refining this result, we show that the parsing complexity of CCG is exponential only in the maximum degree of composition. When that degree is fixed, parsing can be carried out in polynomial time. Our finding is interesting from
-
Linear-Time Calculation of the Expected Sum of Edge Lengths in Random Projective Linearizations of Trees Comput. Linguist. (IF 9.3) Pub Date : 2022-09-01 Lluís Alemany-Puig, Ramon Ferrer-i-Cancho
The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines