当前位置: X-MOL 学术Int. J. Geograph. Inform. Sci. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Sherloc: a knowledge-driven algorithm for geolocating microblog messages at sub-city level
International Journal of Geographical Information Science ( IF 4.3 ) Pub Date : 2020-06-16 , DOI: 10.1080/13658816.2020.1764003
Laura Di Rocco 1 , Federico Dassereto 2 , Michela Bertolotto 3 , Davide Buscaldi 4 , Barbara Catania 2 , Giovanna Guerrini 2

ABSTRACT Many solutions for coarse geolocating of users at the time they post a message exist. However, for many important applications, like traffic monitoring and event detection, finer geolocation at the level of city neighborhoods, i.e., at a sub-city level, is needed. Data-driven approaches often do not guarantee good accuracy and efficiency due to the higher number of sub-city level positions to be estimated and the low availability of balanced and large training sets. We claim that external information sources overcome limitations of data-driven approaches in achieving good accuracy for sub-city level geolocation and we present a knowledge-driven approach achieving good results once the reference area of a message is known. Our algorithm, called Sherloc, exploits toponyms in the message, extracts their semantic from a geographic gazetteer, and embeds them into a metric space that captures the semantic distance among them. We identify the semantically closest toponyms to a message and then cluster them with respect to their spatial locations. Sherloc requires no prior training, it can infer the location at sub-city level with high accuracy, and it is not limited to geolocating on a fixed spatial grid.



摘要 存在许多用于在用户发布消息时对其进行粗略地理定位的解决方案。然而,对于许多重要的应用,如交通监控和事件检测,需要在城市社区级别(即子城市级别)进行更精细的地理定位。由于要估计的子城市级别位置数量较多,并且平衡和大型训练集的可用性较低,因此数据驱动的方法通常不能保证良好的准确性和效率。我们声称外部信息源克服了数据驱动方法在实现子城市级地理定位的良好准确性方面的局限性,并且我们提出了一种知识驱动的方法,一旦知道消息的参考区域,就可以获得良好的结果。我们的算法称为 Sherloc,它利用消息中的地名,从地理地名词典中提取它们的语义,并将它们嵌入到一个度量空间中,以捕获它们之间的语义距离。我们识别与消息语义最接近的地名,然后根据它们的空间位置对它们进行聚类。Sherloc 无需事先训练,可以高精度地推断出子城市级别的位置,并且不限于在固定的空间网格上进行地理定位。