当前位置: X-MOL 学术World Wide Web › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Geographical address representation learning for address matching
World Wide Web ( IF 3.7 ) Pub Date : 2020-02-28 , DOI: 10.1007/s11280-020-00782-2
Shuangli Shan , Zhixu Li , Qiang Yang , An Liu , Lei Zhao , Guanfeng Liu , Zhigang Chen

Address matching is a crucial task in various location-based businesses like take-out services and express delivery, which aims at identifying addresses referring to the same location in address databases. It is a challenging one due to various possible ways to express the address of a location, especially in Chinese. Traditional address matching approaches relying on string similarities and learning matching rules to identify addresses referring to the same location, could hardly solve the cases with redundant, incomplete or unusual expression of addresses. In this paper, to learn the geographical semantic representations for address strings, we novelly propose to get rich contexts for addresses from the Web through Web search engines, which could strongly enrich the semantic meaning of addresses that could be learned. Apart from that, we propose a two-stage geographical address representation learning model for address matching. In the first stage, we propose to use an encode-decoder architecture to learn the semantic vector representation for each address string where an up-sampling and sub-sampling strategy is applied to solve the problem of address redundancy and incompleteness. The attention mechanism is also applied to the model to highlight important features of addresses in their semantic representations. And in the second stage, we construct a single large graph from the corpus, which contains address elements and addresses as nodes, and the edges between nodes are built by word co-occurrence information to learn embedding representations for all the nodes on the graph. Our empirical study conducted on two real-world address datasets demonstrates that our approach greatly improves both precision (up to 8%) and recall (up to 12%) of the state-of-the-art existing methods.

中文翻译:

地理地址表示学习以进行地址匹配

在各种基于位置的业务(例如外卖服务和快递)中,地址匹配是一项至关重要的任务,其目的是识别引用地址数据库中相同位置的地址。由于存在各种可能的方式来表达地点的地址,尤其是中文,这是一种挑战。传统的地址匹配方法依靠字符串相似性和学习匹配规则来识别引用同一位置的地址,很难解决地址冗余,不完整或异常表达的情况。在本文中,为了学习地址字符串的地理语义表示,我们新颖地提出通过Web搜索引擎从Web获得丰富的地址上下文,这可以极大地丰富可以学习的地址的语义。除此之外,我们提出了一个用于地址匹配的两阶段地理地址表示学习模型。在第一阶段,我们建议使用编码-解码器体系结构来学习每个地址字符串的语义矢量表示,其中采用上采样和子采样策略来解决地址冗余和不完整性的问题。注意机制还应用于模型,以在其语义表示中突出显示地址的重要特征。在第二阶段,我们从语料库构建一个大图,其中包含地址元素和地址作为节点,并且节点之间的边缘由单词共现信息构建,以学习图上所有节点的嵌入表示。
更新日期:2020-02-28
down
wechat
bug