Skip to main content
Log in

Parallel co-location mining with MapReduce and NoSQL systems

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

With the rapid growth of georeferenced data, large-scale data processing and analysis methods are needed for spatial big data. Spatial co-location pattern mining is an interesting and important issue in spatial data mining area which discovers the subsets of features whose objects are frequently located together in geographic proximity. There are several works for efficiently processing co-location pattern discovery; however, they may be insufficient for large dense spatial data because the mining task takes up a lot of processing time and memory. In this work, we leveraged the power of a modern distributed computing platform, Hadoop, and developed an algorithm (called ParColoc) for parallel co-location mining on the MapReduce framework. This study explored challenge issues in designing the parallel co-location mining algorithm and solved them with adopting a spatial declusteirng technique and a NoSQL system. We conducted an experimental evaluation with real-world data and synthetic data to examine the effectiveness of proposed methods. The experiment result shows that ParColoc is a promising method for parallel co-location mining in cloud computing environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Apache hbase i/o—hfile. http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/

  2. Datasf. https://datasf.org/

  3. Geodeg. http://geodeg.com/

  4. Giraph. http://giraph.apache.org/

  5. Microsoft azure. https://azure.microsoft.com/en-us/

  6. Nvidia corporation: Nvidia cuda toolkit programming guide. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  7. Openstreetmap. http://planet.openstreetmap.org/

  8. Adilmagambetov A, Zaiane OR, Osornio-Vargas A (2013) Discovering co-location patterns in datasets with extended spatial objects. In: Proceedings of the international conference on data warehousing and knowledge discovery, pp 84–96

  9. Agarwal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of international conference on very large data bases, pp 487–499

  10. Aghajarian D, Prasad S (2017) A spatial join algorithm based on a non-uniform grid technique over GPGPU. In: Proceedings of ACM SIGSPATIAL international conference on advances in geographic information systems, pp 56:1–56:4

  11. Aghajarian D, Puri S, Prasad S (2016) GCMF: an efficient end-to-end spatial join system over large polygonal datasets on GPGPU platform. In: Proceedings of ACM SIGSPATIAL international conference on advances in geographic information systems, pp 18:1–18:10

  12. Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop gis: a high performance spatial data warehousing system over MapReduce. Proc VLDB Endow 6(11):1009–1020

    Article  Google Scholar 

  13. Akbari M, Samadzadegan F, Weibel R (2015) A generic regional spatio-temporal co-occurrence pattern mining model: a case study for air pollution. Geogr Syst 17(3):249–274

    Article  Google Scholar 

  14. Allard D, Naveau P (2007) A new spatial skew-normal random field model. Commun Stat Theory Methods 36(9):1–14

    Article  MathSciNet  Google Scholar 

  15. Andrzejewski W, Boinski P (2013) GPU-accelerated collocation pattern discovery. In: Proceedings of East European conference on advances in databases and information systems—volume 8133, pp 302–315

  16. Andrzejewski W, Boinski P (2015) Parallel GPU-based plane-sweep algorithm for construction of iCPI-trees. Database Manag 26(3):1–20

    Article  Google Scholar 

  17. Andrzejewski W, Boinski P (2018) Efficient spatial co-location pattern mining on multiple GPUs. Expert Syst Appl 93(C):465–483

    Article  Google Scholar 

  18. Arge L, Procopiuc O, Ramaswamy S, Suel T, Vitter JS (1998) Scalable sweeping-based spatial join. In: Proceedings of international conference on very large data bases, pp 570–581

  19. Brinkhoff T, Kriegel H, Seeger B (1993) Efficient processing of spatial joins using R-trees, pp 237–246

  20. Cahng F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data. In: Proceedings of international symposium on operating system design and implementation

  21. Canh TV, Gertz M (2012) A constraint neighborhood based approach for co-location pattern mining. In: Proceedings of the international conference on knowledge and systems engineering, pp 128–135

  22. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  23. Dijkstra A, Janssen F, De Bakker M, Bos J, Lub R, Van Wissen LJG, Hak E (2013) Using spatial analysis to predict health care use at the local level: a case study of type 2 diabetes medication use and its association with demographic change and socioeconomic status. PLoS ONE 8:e72730

    Article  Google Scholar 

  24. Eick CF, Parmar R, Ding W, Stepinski TF, Nicot J (2008) Finding regional co-location patterns for sets of continuous variables in spatial datasets. In: Proceedings of the ACM SIGSPATIAL international conference on advances in geographic information systems, pp 1–10

  25. Eldawy A, Mokbel MF (2015) Spatialhadoop: a MapReduce framework for spatial data. In: Proceedings of IEEE international conference on data engineering

  26. Flouvat F, Selmaoui-Folcher N, Gay D, Rouet I, Grison C (2010) Constrained colocation mining: application to soil erosion characterization. In: Proceedings of the ACM symposium on applied computing, pp 1054–1059

  27. Ghoting A, Krishnamurthy R, Pednault E, Reinwald B, Sindhwani V, Tatikonda S, Tian Y, Vaithyanathan S (2011) Systemml: declarative machine learning on MapReduce. In: Proceedings of international conference on data engineering, pp 231–242

  28. Günther O (1993) Efficient computation of spatial joins. In: Proceedings of international conference on data engineering, pp 50–59

  29. Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings ofACM SIGMOD international conference on management of data, pp 47–57

  30. Hadoop. The apache software foundation. Apache hadoop. http://hadoop.apache.org/

  31. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  32. HBase. The apache software foundation. Apache hbase. http://hbase.apache.org/

  33. He B, Fang W, Luo Q, Govindaraju NK, Wang T (2008) Mars: a MapReduce framework on graphics processors. In: Proceedings of international conference on parallel architectures and compilation techniques, pp 260–269

  34. Hong C, Chen D, Chen W, Zheng W, Lin H (2010) MapCG: writing parallel program portable between CPU and GPU. In: Proceedings of international conference on parallel architectures and compilation techniques, pp 217–226

  35. Hsiao H, Tsai M, Wang S (2006) Spatial data mining of colocation patterns for decision support in agriculture. Asian J Health Inf Sci 1(1):61–72

    Google Scholar 

  36. Huang Y, Shekhar S, Xiong H (2004) Discovering colocation patterns from spatial data sets: a general approach. IEEE Trans Knowl Data Eng 16(12):1472–1485

    Article  Google Scholar 

  37. Impala. http://impala.io/

  38. Jacox EH, Samet H (2007) Spatial join techniques. ACM Trans Database Syst 32(1):1–14

    Article  Google Scholar 

  39. Jiang C, Coenen F, Zito M (2013) A survey of frequent subgraph mining algorithms. Knowl Eng Rev 28(1):75–105

    Article  Google Scholar 

  40. Jung C, Sun C (2006) Development of a GIService based on spatial data mining for location choice of convenience stores in Taipei city. Geoinformatics 2016: Spatial Information Technology, vol 6421

  41. Koperski K, Han J (1995) Discovery of spatial association rules in geographic information databases. In: Proceedings of the international symposium on large spatial data bases, pp 47–66

  42. Lee I, Phillips P (2008) Urban crime analysis through areal categorized multivariate association mining. Appl Artif Intell 22(5):483–499

    Article  Google Scholar 

  43. Leibovici DG, Claramunt C, Guyader DL, Brosset D (2014) Local and global spatio-temporal entropy indices based on distance-ratios and co-occurrences distributions. Geogr Inf Sci 28(5):1061–1084

    Article  Google Scholar 

  44. Li J, Adilmagambetov A, Mohomed SMJ, Zaïane OR, Osornio-Vargas A, Wine O (2016) On discovering co-location patterns in datasets: a case study of pollutants and child cancers. Geoinformatica 20(4):651–692

    Article  Google Scholar 

  45. Lin J (2009) The curse of Zipf and limits to parallelization: a look at the stragglers problem in MapReduce. In: Proceedings of workshop on large-scale distributed systems for information retrieval

  46. Lo M, Ravishankar CV (1996) Spatial hash-joins. In: Proceedings of ACM SIGMOD international conference on management of data, pp 247–258

  47. Mennis J, Liu JW (2005) Mining association rules in spatio-temporal data: an analysis of urban socioeconomic and land cover change. Trans GIS 9(1):5–17

    Article  Google Scholar 

  48. Mohan P, Shekhar S, Shine J, ROgers J, Jiang Z, Wayant N (2011) A neighborhood graph based approach to regional co-location pattern discovery: a summary of results. In: Proceedings of the ACM SIGSPATIAL international conference on advances in geographic information systems, pp 122–132

  49. Morimoto Y (2001) Mining frequent neighboring class sets in spatial databases. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 353–358

  50. Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC (2008) GPU computing. Proc IEEE 96(5):879–899

    Article  Google Scholar 

  51. Patel JM, DeWitt DJ (1996) Partition based spatial-merge join. In: Proceedings of ACM SIGMOD international conference on management of data, pp 259–270

  52. Phillips P, Lee I (2009) Mining top-k and bottom-k correlative crime patterns through graph representations. In: Proceedings of the IEEE international conference on intelligence and security informatics, pp 25–30

  53. Preparata FP, Shamos MI (1988) Computational geometry. Springer, Berlin

    MATH  Google Scholar 

  54. Qian F, Chiew K, He Q, Huang H (2014) Mining regional co-location patterns with \(k\)nns. Intell Inf Syst 42(3):485–505

    Article  Google Scholar 

  55. Ray S, Simion B, Brown AD., Johnson R (2013) A parallel spatial data analysis infrastructure for the cloud. In: Proceedings of ACM SIGSPATIAL international conference on advances in geographic information systems, pp 284–293

  56. Robinson JT (1981) The k-d-b-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 10–18

  57. Sainju A, Jiang Z (2017) Grid-based colocation mining algorithms on GPU for big spatial event data: a summary of results. In: Proceedings of international symposium on spatial and temporal databases

  58. Sainju AM, Aghajarian D, Jiang Z, Prasad SK (2018) Parallel grid-based colocation mining algorithms on GPUs for big spatial event data. IEEE Trans Big Data. https://doi.org/10.1109/TBDATA.2018.2871062

  59. Sengstock C, Gertz M, Canh TV (2012) Spatial interestingness measures for co-location pattern mining. In: Proceedings of IEEE international conference on data mining workshop, pp 821–826

  60. Shekhar S, Chawla S (2003) Spatial databases: a tour. Prentice Hall, Upper Saddle River

    Google Scholar 

  61. Shekhar S, Huang Y (2001) Co-location rules mining: a summary of results. In: Proceedings of international symposium on spatio and temporal database, pp 236–256

  62. Sierra R, Stephens CR (2012) Exploratory analysis of the interrelations between co-located boolean spatial features using network graphs. Geogr Inf Sci 26(3):441–468

    Article  Google Scholar 

  63. Stuart JA, Owens JD (2011) Multi-GPU MapReduce on GPU clusters. In: Proceedings of IEEE international parallel & distributed processing symposium, pp 1068–1079

  64. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endow 2(2):1626–1629

    Article  Google Scholar 

  65. Vatsavai RR, Ganguly A, Chandola V, Stefanidis A, Klasky S, Shekhar S (2012) Spatiotemporal data mining in the era of big spatial data: algorithms and applications. In: Proceedings of ACM SIGSPATIAL international workshop on analytics for big geospatial data, pp 1–10

  66. Wang J, Hsu W, Lee ML (2005) A framework for mining topological patterns in spatio-temporal databases. In: Proceedings of ACM international conference on information and knowledge management, pp 429–436

  67. Wang L, Chen H, Zhao L, Zhou L (2010) Efficiently mining co-location rules on interval data. In: Proceedings of international conference on advanced data mining and applications: part I, pp 477–488

  68. Wang L, Wu P, Chen H (2013) Finding probabilistic prevalent colocations in spatially uncertain data sets. IEEE Trans Knowl Data Eng 25(4):790–804

    Article  Google Scholar 

  69. Wang L, Zhou L, Lu J, Yip J (2009) An order-clique-based approach for mining maximal co-locations. Inf Sci 179(19):3370–3382

    Article  Google Scholar 

  70. Wang S, Huang Y, Wang X (2013) Regional co-locations of arbitrary shapes. In: Proceedings of international conference on advances in spatial and temporal databases, pp 19–37

  71. Weiler M, Schmid KA, Mamoulis N, Renz M (2015) Geo-social co-location mining. In: Proceedings of international ACM workshop on managing and mining enriched geo-spatial data, pp 19–24

  72. Xiao X, Xie X, Luo Q, Ma W (2008) Density based co-location pattern discovery. In: Proceedings of ACM SIGSPATIAL international conference on advances in geographic information systems, pp 1–10

  73. Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo JS (2004) A framework for discovering co-location patterns in data sets with extended spatial objects. In: Proceedings of SIAM international conference on data mining, pp 78–89

  74. Yao X, Peng L, Yang L, Chi T (2016) A fast space-saving algorithm for maximal co-location pattern mining. Expert Syst Appl 63(C):310–323

    Article  Google Scholar 

  75. Yoo JS, Bow M (2011) Mining maximal co-located event sets. In: Proceedings of Pacific-Asia international conference on knowledge discovery and data mining, pp 351–362

  76. Yoo JS, Bow M (2011) Mining top-k closed co-location patterns. In: Proceedings of IEEE international conference on spatial data mining and geographical knowledge services, pp 100–105

  77. Yoo JS, Bow M (2012) Mining spatial colocation patterns: a different framework. Data Min Knowl Discov 24(1):159–194

    Article  MathSciNet  Google Scholar 

  78. Yoo JS, Bow M (2019) A framework for generating condensed co-location sets from spatial databases. Intell Data Anal 23(2):333–355

    Article  Google Scholar 

  79. Yoo JS, Doulware B, Kimmey D (2014) A parallel spatial co-location mining algorithm based on MapReduce. In: Proceedings of IEEE international congress on BigData, pp 25–31

  80. Yoo JS, Shekhar S (2005) A join-less approach for co-location pattern mining: a summary of results. In: Proceedings of IEEE international conference on data mining, pp 813–816

  81. Yoo JS, Shekhar S (2006) A join-less approach for mining spatial co-location patterns. IEEE Trans Knowl Data Eng 18(10):1323–1337

    Article  Google Scholar 

  82. Yoo JS, Vasudevan H (2014) Effectively updating co-location patterns in evolving spatial databases. In: Proceedings of international conference on pervasive pattern and applications, pp 96–99

  83. Yu J, Wu J, Sarwat M (2015) Geospark: a cluster computing framework for processing large-scale spatial data. In: Proceedings of SIGSPATIAL international conference on advances in geographic information systems, pp 70:1–70:4

  84. Yu W (2016) Spatial co-location pattern mining for location-based services in roadnetworks. Expert Syst Appl 46:324–335

    Article  Google Scholar 

  85. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of USENIX conference on hot topics in cloud computing, pp 10–10

  86. Zhang X, Mamoulis N, Cheung D, Shou Y (2004) Fast mining of spatial collocations. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 384–393

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Soung Yoo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yoo, J.S., Boulware, D. & Kimmey, D. Parallel co-location mining with MapReduce and NoSQL systems. Knowl Inf Syst 62, 1433–1463 (2020). https://doi.org/10.1007/s10115-019-01381-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01381-y

Keywords

Navigation