Siamese networks for large-scale author identification,Computer Speech & Language

当前位置： X-MOL 学术 › Comput. Speech Lang › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Siamese networks for large-scale author identification
Computer Speech & Language ( IF 3.1 ) Pub Date : 2021-05-12 , DOI: 10.1016/j.csl.2021.101241
Chakaveh Saedi , Mark Dras

Authorship attribution is the process of identifying the author of a text. Approaches to tackling it have been conventionally divided into classification-based ones, which work well for small numbers of candidate authors, and similarity-based methods, which are applicable for larger numbers of authors or for authors beyond the training set; these existing similarity-based methods have only embodied static notions of similarity. Deep learning methods, which blur the boundaries between classification-based and similarity-based approaches, are promising in terms of ability to learn a notion of similarity, but have previously only been used in a conventional small-closed-class classification setup.

Siamese networks have been used to develop learned notions of similarity in one-shot image tasks, and also for tasks of mostly semantic relatedness in NLP. We examine their application to the stylistic task of authorship attribution on datasets with large numbers of authors, looking at multiple energy functions and neural network architectures, and show that they can substantially outperform previous approaches.

中文翻译：

用于大规模作者识别的连体网络

作者署名是识别文本作者的过程。解决这个问题的方法通常分为基于分类的方法，适用于少数候选作者，以及基于相似性的方法，适用于大量作者或超出训练集的作者；这些现有的基于相似性的方法只体现了静态的相似性概念。深度学习方法模糊了基于分类和基于相似性的方法之间的界限，在学习相似性概念的能力方面很有前途，但以前仅用于传统的小封闭类分类设置。

Siamese 网络已被用于在一次性图像任务中开发相似性的学习概念，也用于 NLP 中主要是语义相关性的任务。我们研究了它们在具有大量作者的数据集上的作者归属风格任务中的应用，研究了多个能量函数和神经网络架构，并表明它们可以大大优于以前的方法。

更新日期：2021-06-04

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文