当前位置: X-MOL 学术arXiv.cs.DL › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Citations are not opinions: a corpus linguistics approach to understanding how citations are made
arXiv - CS - Digital Libraries Pub Date : 2021-04-16 , DOI: arxiv-2104.08087
Domenic Rosati

Citation content analysis seeks to understand citations based on the language used during the making of a citation. A key issue in citation content analysis is looking for linguistic structures that characterize distinct classes of citations for the purposes of understanding the intent and function of a citation. Previous works have focused on modeling linguistic features first and drawn conclusions on the language structures unique to each class of citation function based on the performance of a classification task or inter-annotator agreement. In this study, we start with a large sample of a pre-classified citation corpus, 2 million citations from each class of the scite Smart Citation dataset (supporting, disputing, and mentioning citations), and analyze its corpus linguistics in order to reveal the unique and statistically significant language structures belonging to each type of citation. By generating comparison tables for each citation type we present a number of interesting linguistic features that uniquely characterize citation type. What we find is that within citation collocates, there is very low correlation between citation type and sentiment. Additionally, we find that the subjectivity of citation collocates across classes is very low. These findings suggest that the sentiment of collocates is not a predictor of citation function and that due to their low subjectivity, an opinion-expressing mode of understanding citations, implicit in previous citation sentiment analysis literature, is inappropriate. Instead, we suggest that citations can be better understood as claims-making devices where the citation type can be explained by understanding how two claims are being compared. By presenting this approach, we hope to inspire similar corpus linguistic studies on citations that derive a more robust theory of citation from an empirical basis using citation corpora

中文翻译:

引用不是意见:一种语料库语言学方法来理解如何进行引用

引文内容分析旨在基于引文制作过程中使用的语言来理解引文。引文内容分析中的一个关键问题是寻找能够描述不同类别引文的语言结构,以了解引文的意图和功能。先前的工作首先着重于对语言特征进行建模,并根据分类任务或注释者之间的协议,得出了每类引用函数所特有的语言结构的结论。在本研究中,我们从大量预先分类的引证语料样本开始,从每个类别的cite Smart Citation数据集(支持,争议和提及引文)中获得200万次引文,并分析其语料库语言学,以揭示属于每种引文类型的独特且具有统计意义的语言结构。通过为每种引文类型生成比较表,我们呈现了许多有趣的语言特征,这些特征独特地描述了引文类型。我们发现,在引文搭配中,引文类型与情感之间的相关性非常低。此外,我们发现,跨类别引用的主观性很低。这些发现表明,同居者的情绪不是引证功能的预测因素,并且由于其主观性较低,以前的引证情感分析文献中所隐含的理解引证的表达意见的模式是不合适的。反而,我们建议将引用更好地理解为提出索赔的设备,其中可以通过理解如何比较两个声明来解释引用类型。通过介绍这种方法,我们希望启发有关引文的类似语料库语言学研究,从使用引证语料库的经验基础上得出更可靠的引文理论
更新日期:2021-04-19
down
wechat
bug