Entity summarization: State of the art and future challenges,Journal of Web Semantics

当前位置： X-MOL 学术 › J. Web Semant. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Entity summarization: State of the art and future challenges
Journal of Web Semantics ( IF 2.1 ) Pub Date : 2021-05-21 , DOI: 10.1016/j.websem.2021.100647
Qingxia Liu , Gong Cheng , Kalpa Gunaratna , Yuzhong Qu

The increasing availability of semantic data has substantially enhanced Web applications. Semantic data such as RDF data is commonly represented as entity-property-value triples. The magnitude of semantic data, in particular the large number of triples describing an entity, could overload users with excessive amounts of information. This has motivated fruitful research on automated generation of summaries for entity descriptions to satisfy users’ information needs efficiently and effectively. We focus on this prominent topic of entity summarization, and our research objective is to present the first comprehensive survey of entity summarization research. Rather than separately reviewing each method, our contributions include (1) identifying and classifying technical features of existing methods to form a high-level overview, (2) identifying and classifying frameworks for combining multiple technical features adopted by existing methods, (3) collecting known benchmarks for intrinsic evaluation and efforts for extrinsic evaluation, and (4) suggesting research directions for future work. By investigating the literature, we synthesized two hierarchies of techniques. The first hierarchy categories generic technical features into several perspectives: frequency and centrality, informativeness, and diversity and coverage. In the second hierarchy we present domain-specific and task-specific technical features, including the use of domain knowledge, context awareness, and personalization. Our review demonstrated that existing methods are mainly unsupervised and they combine multiple technical features using various frameworks: random surfer models, similarity-based grouping, MMR-like re-ranking, or combinatorial optimization. We also found a few deep learning based methods in recent research. Current evaluation results and our case study showed that the problem of entity summarization is still far from being solved. Based on the limitations of existing methods revealed in the review, we identified several future directions: the use of semantics, human factors, machine and deep learning, non-extractive methods, and interactive methods.

中文翻译：

实体摘要：最先进的技术和未来的挑战

语义数据可用性的增加极大地增强了 Web 应用程序。RDF 数据等语义数据通常表示为实体-属性-值三元组。语义数据的量级，尤其是描述实体的大量三元组，可能会使用户因过多的信息而过载。这激发了对自动生成实体描述摘要的富有成效的研究，以有效地满足用户的信息需求。我们专注于实体摘要这个突出的主题，我们的研究目标是首次对实体摘要研究进行全面调查。我们的贡献不是单独审查每种方法，而是包括（1）识别和分类现有方法的技术特征以形成高级概述，(2) 识别和分类结合现有方法采用的多种技术特征的框架，(3) 收集已知的内在评估基准和外在评估的努力，以及 (4) 为未来工作提出研究方向。通过查阅文献，我们综合了两种层次的技术。第一个层次将通用技术特征分为几个方面：频率和中心性、信息量以及多样性和覆盖范围。在第二个层次中，我们展示了特定领域和特定任务的技术特征，包括领域知识的使用、上下文感知和个性化。我们的审查表明，现有方法主要是无监督的，它们使用各种框架结合了多种技术特征：随机冲浪模型、基于相似性的分组、类似 MMR 的重新排名或组合优化。我们还在最近的研究中发现了一些基于深度学习的方法。目前的评估结果和我们的案例研究表明，实体摘要的问题还远未解决。基于审查中揭示的现有方法的局限性，我们确定了几个未来的方向：语义的使用、人为因素、机器和深度学习、非提取方法和交互方法。

更新日期：2021-06-09

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南11