Abstract
Tremendous increase in the use of the mobile devices equipped with the GPS and other location sensors has resulted in the generation of a huge amount of movement data. In recent years, mining this data to understand the collective mobility behavior of humans, animals and other objects has become popular. Numerous mobility patterns, or their mining algorithms have been proposed, each representing a specific movement behavior. Convoy pattern is one such pattern which can be used to find groups of people moving together in public transport or to prevent traffic jams. A convoy is a set of at least m objects moving together for at least k consecutive time stamps where m and k are user-defined parameters. Existing algorithms for detecting convoy patterns do not scale to real-life dataset sizes. Therefore in this paper, we propose a generic distributed convoy pattern mining algorithm called DCM and show how such an algorithm can be implemented using the MapReduce framework. We present a cost model for DCM and a detailed theoretical analysis backed by experimental results. We show the effect of partition size on the performance of DCM. The results from our experiments on different data-sets and hardware setups, show that our distributed algorithm is scalable in terms of data size and number of nodes, and more efficient than any existing sequential as well as distributed convoy pattern mining algorithm, showing speed-ups of up to 16 times over SPARE, the state of the art distributed co-movement pattern mining framework. DCM is thus able to process large datasets which SPARE is unable to.
Similar content being viewed by others
Notes
References
Aung HH, Tan KL (2010) Discovery of evolving convoys. In: International conference on scientific and statistical database management. Springer, pp 196–213
Brinkhoff T (2000) Generating network-based moving objects. In: Scientific and statistical database management, 2000. Proceedings. 12th international conference on. IEEE, pp 253–255
Brinkhoff T (2002) A framework for generating network-based moving objects. GeoInformatica 6(2):153–180
Chen TS, Chang CY (2002) Skewed data partition and alignment techniques for compiling programs on distributed memory multicomputers. J Supercomput 21(2):191–211
Dai BR, Lin I, et al. (2012) Efficient map/reduce-based dbscan algorithm with optimized data partition. In: Cloud computing (CLOUD), 2012 IEEE 5th international conference on. IEEE, pp 59–66
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Douglas DH, Peucker TK (1973) Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica 10(2):112–122
Ester M, Kriegel HP, Sander J, Xu X, et al. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231
Fan Q, Zhang D, Wu H, Tan KL (2016) A general and parallel platform for mining co-movement patterns over large-scale trajectories. Proc VLDB Endowment 10(4):313–324
Gudmundsson J, van Kreveld M (2006) Computing longest duration flocks in trajectory data. In: Proceedings of the 14th annual ACM international symposium on advances in geographic information systems. ACM, pp 35–42
He Y, Tan H, Luo W, Feng S, Fan J (2014) Mr-dbscan: a scalable mapreduce-based dbscan algorithm for heavily skewed data. Front Comput Sci 8(1):83–99
Hua KA, Lee C (1991) Handling data skew in multiprocessor database computers using partition tuning. In: VLDB. Citeseer, pp 525–535
Jeung H, Shen HT, Zhou X (2008) Convoy queries in spatio-temporal databases. In: 2008 IEEE 24th international conference on data engineering. IEEE, pp 1457–1459
Jeung H, Yiu ML, Zhou X, Jensen CS, Shen HT (2008) Discovery of convoys in trajectory databases. Proc VLDB Endowment 1(1):1068–1080
Kalnis P, Mamoulis N, Bakiras S (2005) On discovering moving clusters in spatio-temporal data. In: International symposium on spatial and temporal databases. Springer, pp 364–381
Kwon Y, Ren K, Balazinska M, Howe B, Rolia J (2013) Managing skew in hadoop. IEEE Data Eng Bull 36(1):24–33
Lacerda T, Fernandes S (2016) Scalable real-time flock detection. In: Global communications conference (GLOBECOM), 2016 IEEE. IEEE, pp 1–7
Naserian E, Wang X, Xu X, Dong Y (2016) Discovery of loose travelling companion patterns from human trajectories. In: High performance computing and communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), 2016 IEEE 18th International Conference on. IEEE, pp 1238–1245
Orakzai F, Calders T, Pedersen TB (2016) Distributed convoy pattern mining. In: 17th IEEE international conference on mobile data management
Orakzai F, Devogele T, Calders T (2015) Towards distributed convoy pattern mining. In: Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems, GIS ’15. https://doi.org/10.1145/2820783.2820840. ACM, pp 50:1–50:4, DOI New York, (to appear in print)
Patwary MMA, Palsetia D, Agrawal A, Liao WK, Manne F, Choudhary A (2012) A new scalable parallel dbscan algorithm using the disjoint-set data structure. In: High Performance Computing, Networking, Storage and Analysis (SC), 2012 International Conference for, pp. 1–11. IEEE
Tang LA, Zheng Y, Yuan J, Han J, Leung A, Hung CC, Peng WC (2012) On discovery of traveling companions from streaming trajectories. In: 2012 IEEE 28th International conference on data engineering (ICDE). IEEE, pp 186–197
Vieira MR, Bakalov P, Tsotras VJ (2009) On-line discovery of flock patterns in spatio-temporal data. In: Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems. ACM, pp 286–295
Wang D, Joshi G, Wornell G (2014) Efficient task replication for fast response times in parallel computation. In: ACM SIGMETRICS performance evaluation review, vol 42. ACM, pp 599–600
Yoon H, Shahabi C (2009) Accurate discovery of valid convoys from moving object trajectories. In: ICDM workshops, pp. 636–643
Yuan J, Zheng Y, Xie X, Sun G (2011) Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 316–324
Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, Huang Y (2010) T-drive: driving directions based on taxi trajectories. In: Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems. ACM, pp 99–108
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation. USENIX Association, pp 2–2
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Scalability on the NUMA architecture
Appendix A: Scalability on the NUMA architecture
NUMA (Non Uniform Memory Access) systems are low-cost multi-processor platforms that support large numbers of processors on a single board. Faster CPUS are generally constrained by the memory bandwidth under memory-intensive workload. Symmetric multiprocessing (SMP) systems use a shared bus to connect processors, thus, many processors have to compete for memory bandwidth. The NUMA architecture solves this problem by connecting several low-end processor nodes each having its own cache and memory, using a high-speed connection. Each node has a memory controller which allows it to use memory on all other nodes in addition to its own memory, thus abstracting the memory as a single image. When a processor requests data from a memory location that does not exist in its local memory, the data is transfered over the NUMA connection, which is slower than the connection between the processor and its local memory. Thus, memory access time is not uniform and varies depending upon if the access is local or remote.
In NUMA systems, cache coherence problem occurs when two or more processors access the same shared data. If one processor modifies its copy of the data, the copies of this data in the cache of other processors will become stale. ccNUMA (cache coherent NUMA) machines ensure that a processor accessing a memory location receives the most up-to-date version of the data. Cache coherence can be ensured either in software or hardware, however software approaches tend to be slower than the hardware ones.
We analysed the performance of DCMMR on AMD Opteron based NUMA machines. Figure 31 shows the architecture of the AMD Opteron 6300 series processors. The processor has two NUMA nodes connected with a HyperTransport bus. Each node has 8 cores. The cores are arranged in 4 pairs such that each pair shares a Floating Point Unit (FPU) and an L2 cache of 2MB. The pairs are connected to each other by the Crossbar Switch which connects to the HT bus through an HT interface. Each node has its own memory controller with 2 channels. Each channel supports memory up to 32 GB.
Figure 32 shows the architecture of the AMD Opteron 6300 series quad-processor ccNUMA system which we used for one set of our experiments. The system consists of 4 AMD Opteron 6376 processors (Fig. 31) interconnected through HT buses. The system has 512 GB of memory (128 GB per processor, 64 GB per NUMA node). If a processor core is the first one to request a memory page, it is mapped to the memory of the node to which the core belongs (first touch policy). A NUMA aware OS tries to keep the threads running always in the same core pair because they share the same L2 cache. Moving a thread to another core-pair will cause a performance degradation because of cache invalidation. A thread gets further performance hit if it is moved to another node because it needs to get data from a remote node’s memory. Therefore using multiple cores for running a process using context switching although increases the performance but the increase might not be linear depending upon the location of the core the process is moved to.
If an algorithm accesses all of its data from the memory, ccNUMA increases memory bandwidth at a ratio effectively the same as the number of NUMA nodes. In our case, it is expected to have 8 times the memory bandwidth of an SMP machine but it does not necessarily mean that the performance of an algorithm will scale linearly with increase in the number of cores because of the performance bottlenecks explained above. The following steps are required from a NUMA-aware OS for optimal NUMA performance:
-
Processes should be scheduled on cores as close as possible to the memory that contains its data.
-
OS should maintain a queue per node
-
Memory allocation for a process should be in the memory of a single node
-
All child processes should be scheduled on the same node during the lifetime of the parent process
The two most common policies supported by the Linux kernel are NODE LOCAL and INTERLEAVE.Footnote 8Footnote 9 In NODE LOCAL mode, an allocation occurs from the memory node local to where the code is currently executing where as in the INTERLEAVE mode, allocation occurs round-robin. The INTERLEAVE policy is used to distribute memory accesses for data structures that may be accessed from multiple processors in the system in order to have an even load on the interconnect and the memory of each node.
The memory management policies of the OS work best for the general cases and not for a specific application with a different memory access behaviour. When the memory load of a NUMA system increases, its memory management overhead increases, thus resulting in the overall degraded performance. Therefore the best approach is to have an application do the management itself. Hadoop runs in Java Virtual Machines (JVMs) which come with support for NUMA but Hadoop itself is not NUMA aware. Thus, an algorithm running on Hadoop on a NUMA system shows lower scalability in terms of number of cores when compared to its execution on a cluster of SMP machines with the same number of cores.
Rights and permissions
About this article
Cite this article
Orakzai, F., Pedersen, T.B. & Calders, T. Distributed mining of convoys in large scale datasets. Geoinformatica 25, 353–396 (2021). https://doi.org/10.1007/s10707-020-00431-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-020-00431-w