当前位置: X-MOL 学术arXiv.cs.IR › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching
arXiv - CS - Information Retrieval Pub Date : 2020-09-17 , DOI: arxiv-2009.08114
Mariona Coll Ardanuy, Kasra Hosseini, Katherine McDonough, Amrey Krause, Daniel van Strien and Federico Nanni

Recognizing toponyms and resolving them to their real-world referents is required for providing advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a toponym previously recognized. While it has traditionally received little attention in the research community, it has been shown that candidate selection has a significant impact on downstream tasks (i.e. entity resolution), especially in noisy or non-standard text. In this paper, we introduce a flexible deep learning method for candidate selection through toponym matching, using state-of-the-art neural network architectures. We perform an intrinsic toponym matching evaluation based on several new realistic datasets, which cover various challenging scenarios (cross-lingual and regional variations, as well as OCR errors). We report its performance on candidate selection in the context of the downstream task of toponym resolution, both on existing datasets and on a new manually-annotated resource of nineteenth-century English OCR'd text.

中文翻译:

通过地名匹配进行地理候选选择的深度学习方法

要提供对文本数据的高级语义访问,需要识别地名并将其解析为真实世界的指称。这一过程常常受到地名高度变异的阻碍。候选者选择的任务是识别可以被先前识别的地名引用的潜在实体。虽然传统上它在研究界很少受到关注,但已经表明候选者选择对下游任务(即实体解析)有重大影响,尤其是在嘈杂或非标准文本中。在本文中,我们介绍了一种灵活的深度学习方法,通过地名匹配,使用最先进的神经网络架构进行候选选择。我们基于几个新的现实数据集执行内在地名匹配评估,其中涵盖了各种具有挑战性的场景(跨语言和区域差异,以及 OCR 错误)。我们在现有数据集和 19 世纪英语 OCR 文本的新手动注释资源的地名解析下游任务的上下文中报告了其在候选者选择方面的表现。
更新日期:2020-09-23
down
wechat
bug