Is your document novel? Let attention guide you. An attention-based model for document-level novelty detection,Natural Language Engineering

当前位置： X-MOL 学术 › Nat. Lang. Eng. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Is your document novel? Let attention guide you. An attention-based model for document-level novelty detection
Natural Language Engineering ( IF 2.3 ) Pub Date : 2020-04-24 , DOI: 10.1017/s1351324920000194
Tirthankar Ghosal , Vignesh Edithal , Asif Ekbal , Pushpak Bhattacharyya , Srinivasa Satya Sameer Kumar Chivukula , George Tsatsaronis

Detecting, whether a document contains sufficient new information to be deemed as novel, is of immense significance in this age of data duplication. Existing techniques for document-level novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted features extracted from the documents in a rule-based or traditional feature-based machine learning setup. Here, we present an effective approach based on neural attention mechanism to detect document-level novelty without any manual feature engineering. We contend that the simple alignment of texts between the source and target document(s) could identify the state of novelty of a target document. Our deep neural architecture elicits inference knowledge from a large-scale natural language inference dataset, which proves crucial to the novelty detection task. Our approach is effective and outperforms the standard baselines and recent work on document-level novelty detection by a margin of

$\sim$

3% in terms of accuracy.

中文翻译：

你的文件新颖吗？让注意力引导你。基于注意力的文档级新颖性检测模型

检测文档是否包含足够的新信息以被视为小说，在这个数据重复的时代具有巨大的意义。现有的文档级新颖性检测技术大多在词汇级执行，无法解决语义级冗余。这些技术通常依赖于在基于规则或传统的基于特征的机器学习设置中从文档中提取的手工特征。在这里，我们提出了一种基于神经注意力机制的有效方法来检测文档级别的新颖性，而无需任何手动特征工程。我们认为，源文档和目标文档之间文本的简单对齐可以识别新奇的目标文件。我们的深度神经架构从大规模自然语言推理数据集中引出推理知识，这对新颖性检测任务至关重要。我们的方法是有效的，并且优于标准基线和最近在文档级新颖性检测方面的工作，幅度为

$\sim$

3%的准确率。

更新日期：2020-04-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11