当前位置: X-MOL 学术J. Assoc. Inf. Sci. Technol. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Enhancing keyphrase extraction from microblogs using human reading time
Journal of the Association for Information Science and Technology ( IF 2.8 ) Pub Date : 2020-11-19 , DOI: 10.1002/asi.24430
Yingyi Zhang 1 , Chengzhi Zhang 1
Affiliation  

The premise of manual keyphrase annotation is to read the corresponding content of an annotated object. Intuitively, when we read, more important words will occupy a longer reading time. Hence, by leveraging human reading time, we can find the salient words in the corresponding content. However, previous studies on keyphrase extraction ignore human reading features. In this article, we aim to leverage human reading time to extract keyphrases from microblog posts. There are two main tasks in this study. One is to determine how to measure the time spent by a human on reading a word. We use eye fixation durations extracted from an open source eye-tracking corpus (OSEC). Moreover, we propose strategies to make eye fixation duration more effective on keyphrase extraction. The other task is to determine how to integrate human reading time into keyphrase extraction models. We propose two novel neural network models. The first is a model in which the human reading time is used as the ground truth of the attention mechanism. In the second model, we use human reading time as the external feature. Quantitative and qualitative experiments show that our proposed models yield better performance than the baseline models on two microblog datasets.

中文翻译:

利用人类阅读时间增强从微博中提取关键短语

手动标注关键词的前提是读取标注对象的对应内容。直觉上,当我们阅读时,越重要的词会占用越长的阅读时间。因此,通过利用人类阅读时间,我们可以找到相应内容中的显着词。然而,先前关于关键短语提取的研究忽略了人类阅读特征。在本文中,我们旨在利用人类阅读时间从微博帖子中提取关键短语。这项研究有两个主要任务。一种是确定如何衡量人类阅读一个单词所花费的时间。我们使用从开源眼动追踪语料库 (OSEC) 中提取的注视持续时间。此外,我们提出了使眼睛注视持续时间更有效地提取关键短语的策略。另一个任务是确定如何将人类阅读时间整合到关键短语提取模型中。我们提出了两种新颖的神经网络模型。第一个是将人类阅读时间用作注意力机制的基本事实的模型。在第二个模型中,我们使用人类阅读时间作为外部特征。定量和定性实验表明,我们提出的模型在两个微博数据集上的性能优于基线模型。
更新日期:2020-11-19
down
wechat
bug