当前位置: X-MOL 学术Inf. Process. Manag. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Is my stance the same as your stance? A cross validation study of stance detection datasets
Information Processing & Management ( IF 8.6 ) Pub Date : 2022-09-05 , DOI: 10.1016/j.ipm.2022.103070
Lynnette Hui Xian Ng , Kathleen M. Carley

Stance detection identifies a person’s evaluation of a subject, and is a crucial component for many downstream applications. In application, stance detection requires training a machine learning model on an annotated dataset and applying the model on another to predict stances of text snippets. This cross-dataset model generalization poses three central questions, which we investigate using stance classification models on 7 publicly available English Twitter datasets ranging from 297 to 48,284 instances. (1) Are stance classification models generalizable across datasets? We construct a single dataset model to train/test dataset-against-dataset, finding models do not generalize well (avg F1=0.33). (2) Can we improve the generalizability by aggregating datasets? We find a multi dataset model built on the aggregation of datasets has an improved performance (avg F1=0.69). (3) Given a model built on multiple datasets, how much additional data is required to fine-tune it? We find it challenging to ascertain a minimum number of data points due to the lack of pattern in performance. Investigating possible reasons for the choppy model performance we find that texts are not easily differentiable by stances, nor are annotations consistent within and across datasets. Our observations emphasize the need for an aggregated dataset as well as consistent labels for the generalizability of models.



中文翻译:

我的立场和你的立场一样吗?姿态检测数据集的交叉验证研究

姿势检测识别一个人对主题的评估,并且是许多下游应用程序的关键组成部分。在应用中,姿态检测需要在带注释的数据集上训练机器学习模型,并将模型应用于另一个模型来预测文本片段的姿态。这种跨数据集模型泛化提出了三个核心问题,我们使用立场分类模型对 7 个公开可用的英语 Twitter 数据集进行调查,数据集范围从 297 到 48,284 个实例。(1) 立场分类模型是否可以跨数据集推广?我们构建了一个单一的数据集模型来训练/测试数据集与数据集,发现模型不能很好地泛化(平均 F1=0.33)。(2)我们可以通过聚合数据集来提高泛化性吗?我们发现建立在数据集聚合上的多数据集模型具有改进的性能(平均 F1=0.69)。(3) 给定一个建立在多个数据集上的模型,需要多少额外的数据来对其进行微调?由于缺乏性能模式,我们发现确定最小数量的数据点具有挑战性。调查模型性能不稳定的可能原因,我们发现文本不容易通过立场区分,注释在数据集中和数据集之间也不一致。我们的观察强调了对聚合数据集以及模型通用性的一致标签的需求。微调它需要多少额外的数据?由于缺乏性能模式,我们发现确定最小数量的数据点具有挑战性。调查模型性能不稳定的可能原因,我们发现文本不容易通过立场区分,注释在数据集中和数据集之间也不一致。我们的观察强调了对聚合数据集以及模型通用性的一致标签的需求。微调它需要多少额外的数据?由于缺乏性能模式,我们发现确定最小数量的数据点具有挑战性。调查模型性能不稳定的可能原因,我们发现文本不容易通过立场区分,注释在数据集中和数据集之间也不一致。我们的观察强调了对聚合数据集以及模型通用性的一致标签的需求。

更新日期:2022-09-05
down
wechat
bug