A deep learning architecture for semantic address matching,International Journal of Geographical Information Science

当前位置： X-MOL 学术 › Int. J. Geograph. Inform. Sci. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

A deep learning architecture for semantic address matching
International Journal of Geographical Information Science ( IF 5.7 ) Pub Date : 2019-10-24 , DOI: 10.1080/13658816.2019.1681431
Yue Lin _{1,

2} , Mengjun Kang ₁ , Yuyang Wu ₃ , Qingyun Du ₁ , Tao Liu ₄

Affiliation

ABSTRACT Address matching is a crucial step in geocoding, which plays an important role in urban planning and management. To date, the unprecedented development of location-based services has generated a large amount of unstructured address data. Traditional address matching methods mainly focus on the literal similarity of address records and are therefore not applicable to the unstructured address data. In this study, we introduce an address matching method based on deep learning to identify the semantic similarity between address records. First, we train the word2vec model to transform the address records into their corresponding vector representations. Next, we apply the enhanced sequential inference model (ESIM), a deep text-matching model, to make local and global inferences to determine if two addresses match. To evaluate the accuracy of the proposed method, we fine-tune the model with real-world address data from the Shenzhen Address Database and compare the outputs with those of several popular address matching methods. The results indicate that the proposed method achieves a higher matching accuracy for unstructured address records, with its precision, recall, and F1 score (i.e., the harmonic mean of precision and recall) reaching 0.97 on the test set.

中文翻译：

语义地址匹配的深度学习架构

摘要地址匹配是地理编码的关键步骤，在城市规划和管理中发挥着重要作用。迄今为止，基于位置服务的空前发展已经产生了大量的非结构化地址数据。传统的地址匹配方法主要关注地址记录的字面相似性，因此不适用于非结构化的地址数据。在这项研究中，我们引入了一种基于深度学习的地址匹配方法来识别地址记录之间的语义相似性。首先，我们训练 word2vec 模型将地址记录转换为其相应的向量表示。接下来，我们应用增强型顺序推理模型 (ESIM)，一种深度文本匹配模型，进行局部和全局推理以确定两个地址是否匹配。为了评估所提出方法的准确性，我们使用来自深圳地址数据库的真实地址数据对模型进行微调，并将输出与几种流行的地址匹配方法的输出进行比较。结果表明，所提出的方法对非结构化地址记录实现了更高的匹配准确率，其准确率、召回率和F1分数（即准确率和召回率的调和平均值）在测试集上达到了0.97。

更新日期：2019-10-24

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>