当前位置: X-MOL 学术Comput. Environ. Urban Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Large scale geospatial data conflation: A feature matching framework based on optimization and divide-and-conquer
Computers, Environment and Urban Systems ( IF 6.454 ) Pub Date : 2021-02-26 , DOI: 10.1016/j.compenvurbsys.2021.101618
Ting L. Lei

Geospatial data conflation is the process of combining two datasets to create a better one. It has received increased research attention due to the emergence of new data sources and the need to combine information from these sources in spatial analyses. Many conflation methods exist to date, ranging from simple ones based on spatial join, to sophisticated methods based on statistics and optimization models. This paper focuses on the optimization-based conflation approach. It treats feature-matching in conflation as an optimization problem of finding a plan to match features in two datasets that minimizes the total discrepancy. Optimization based conflation methods may overcome some limitations of conventional methods, such as sub-optimality and greediness. However, they have often been deemed impractical in day-to-day analysis because they induce high computational costs (especially in combining large geospatial data).

In this paper, we demonstrate the feasibility of performing optimization-based conflation for large geographic data in Geographic Information Systems. This is accomplished by utilizing efficient network flow-based conflation models and a divide-and-conquer strategy that allows the conflation models to scale to large data. Experiments show that the network-flow based model achieves average recall and precision rates of 97.7% and 90.8%, respectively in small test areas, and outperforms the traditional assignment problem by about 9% each. For larger data, it took the original network-flow model (without divide-and-conquer) nearly two days to conflate the road network in a portion of Los Angeles area near the LAX international airport. By contrast, the same model can be used to conflate the road networks of the entire Los Angeles County, CA in under 3 h with the divide and conquer strategy.



中文翻译:

大规模地理空间数据合并:基于优化和分治法的特征匹配框架

地理空间数据合并是合并两个数据集以创建更好的数据集的过程。由于出现了新的数据源,并且需要在空间分析中合并来自这些源的信息,因此它受到了越来越多的研究关注。迄今为止,存在许多合并方法,从基于空间连接的简单合并方法到基于统计信息和优化模型的复杂方法不等。本文重点介绍基于优化的合并方法。它将合并中的特征匹配视为在两个数据集中寻找匹配特征的计划的优化问题,从而使总差异最小化。基于优化的合并方法可以克服常规方法的某些局限性,例如次优性和贪婪性。然而,

在本文中,我们演示了对地理信息系统中的大型地理数据执行基于优化的合并的可行性。这是通过利用有效的基于网络流的合并模型和允许合并模型扩展到大数据的分治策略来实现的。实验表明,基于网络流的模型在较小的测试区域中分别达到97.7%和90.8%的平均召回率和准确率,并且比传统的分配问题分别高出约9%。对于更大的数据,原始的网络流量模型(没有分而治之)花费了将近两天的时间,将洛杉矶国际机场附近洛杉矶部分地区的道路网络合并在一起。相比之下,可以使用同一模型来合并整个洛杉矶县的路网,

更新日期:2021-02-26
down
wechat
bug