Skip to main content
Log in

Preference-aware sequence matching for location-based services

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Sequantial data are important in many real world location based services. In this paper, we study the problem of sequence matching. Specifically, we want to identify the sequences most similar to a given sequence, under three most commonly used preferece-aware similarity measures, i.e., Fagin’s intersection metric, Kendall’s tau, and Spearman’s footrule. We first analyze the properties of these three preference-aware similarity measures, revealing the connection between them and set intersection. Then, we build an index structure, which is essentially a doubly linked list, to facilitate efficient sequence matching. Lower- and upper-bounds are derived to achieve support prefix-based filtering. Experiments on various datasets show that our proposed method outperforms the baselines by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Some other work may simply denote \(\mathcal {U}\) as the set {1, 2,⋯ ,n} (see for example [5]). These two representations are equivalent in the sense that we can assign a distinct integer ID to each item. In this work, we choose to avoid using such integer IDs so that to minimize any possible confusion between items and their ranks.

  2. For the sake of clarity, we overload the symbol F to compute the distance between top-m lists and suspend the use of the asterisk (∗) which indicates the Hausdorff nature. In addition, F defined in Eq. 2 is a metric whereas F in Eq. 3 is NOT. This is because in Eq. 3 the universe \(\mathcal {U}\) is considered as πσ, which is not true in general. However, compared to the (true) Hausdorff distance F, F in Eq. 3 is preferable in some aspects. For example, consider again σ1 and σ2 the top-3 lists of fruits in Section 3.1: it is more intuitive to compute the distance based merely on what they have in their lists; it is less intuitive (although somehow still makes sense) to alter the distance value whenever, say, there is a new fruit appended into the universe.

  3. Those items outside of πσ are not important, for they have no impact on the distance value. Therefore in this sense it doesn’t matter whether or not σπ = σπ.

  4. Note that is well-defined since σ(v0) − 0 = 0 ≤ zs and σ(vs+ 1) − (s + 1) = mszs.

  5. The number of 1’s in a binary string is known as the Hamming weight of that string. It can be efficiently computed using one of the bitwise tricks named sidewaysaddition [68]. If the operand y is expected to be sparse (i.e., y contains merely a few number of 1’s), then the sideways addition can be done by keeping doing yy & (y − 1) until y = 0 [69].

  6. This DBLP citation network is publicly available at http://arnetminer.org/citation

  7. The Jester dataset is available at http://www.ieor.berkeley.edu/~goldberg/jester-data/

References

  1. Rentfrow PJ, Gosling SD (2003) The do re mi’s of everyday life: the structure and personality correlates of music preferences. J Pers Soc Psychol 84(6):1236–1256

    Article  Google Scholar 

  2. Chausson O Assessing the impact of gender and personality on film preferences. Technical report, University of Cambridge, 2010. myPersonality Project

  3. Cantador I, Ferández-Tobías I, Bellogín A (2013) Relating personality types with user preferences in multiple entertainment domains. In: EMPIRE

  4. Diaconis P, Graham RL (1977) Spearman’s footrule as a measure of disarray. J Royal Statistical Soc Series B (Methodol) 39(2):262–268

    Google Scholar 

  5. Douglas E (1984) Critchlow. Metric methods for analyzing partially ranked data. Technical Report 225, Dept of Statistics, Stanford University

  6. Salama IA, Quade D (1990) A note on spearman’s footrule. Comm Statistics 19(2):591–601

    Article  Google Scholar 

  7. Fagin R, Kumar R, Sivakumar D (2003) Comparing top-k lists. SIAM J Discrete Math 17(1):134–160

    Article  Google Scholar 

  8. Wu S, Crestani F (2003) Methods for ranking information retrieval systems without relevance judgements. In: SAC

  9. Webber W, Moffat A, Zobel J (2010) A similarity measure for indefinite rankings. TOIS 28(4):1–34

    Article  Google Scholar 

  10. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. TKDE 17(6):734–749

    Google Scholar 

  11. Konstas I, Stathopoulos V, Jose JM (2009) On social networks and collaborative recommendation. In: SIGIR

  12. Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2017) Trajectory similarity join in spatial networks. In: PVLDB

  13. Yue X, Xi M, Chen B, Gao M, He Y, Xu J (2019) A revocable group signatures scheme to provide privacy-preserving authentications. Mobile Networks and Applications

  14. Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD

  15. Pal K, Michel S (2016) Efficient similarity search across top-k lists under the Kendall’s tau distance. In: SSDMB2016

  16. Berchtold S, Ertl B, Keim DA, Kriegel H-P, Seidl T (1998) Fast nearest neighbor search in high-dimensional space. In: ICDE

  17. Roussopoulos N, Kelly S, Vincent F (1995) eRic Nearest neighbor queries. In: KDD

  18. Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. TODS 24(2):265–318

    Article  Google Scholar 

  19. Sharifzadeh M, Shahabi C (2010) Vor-tree: R-trees with Voronoi diagrams for efficient processing of spatial nearest neighbor queries. PVLDB 3(1-2):1231–1242

    Google Scholar 

  20. Liu T, Moore AW, Gray A (2006) New algorithms for efficient high-dimensional nonparametric classification. JMLR 7:1135–1158

    Google Scholar 

  21. Sproull RF (1991) Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica 6:579–589

    Article  Google Scholar 

  22. Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbors. In: ICML

  23. Filho RFS, Traina A, Traina C Jr., Faloutsos C (2001) Similarity search without tears: the OMNI-family of all-purpose access methods. In: ICDE

  24. Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) idistance: an adaptive b+-tree based indexing method for nearest neighbor search. TODS 30(2):364–397

    Article  Google Scholar 

  25. Venkateswaran J, Lachwani D, Kahveci T, Jermaine C (2006) Reference-based indexing of sequence databases. In: VLDB

  26. Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101

    Article  Google Scholar 

  27. Kendall M (1948) Rank correlation methods charles griffin and co.

  28. Jurman G, Merler S, Barla A, Paoli S, Galea A, Furlanello C (2008) Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics 24 (2):258–264

    Article  Google Scholar 

  29. Jurman G, Riccadonna S, Visintainer R, Furlanello C (2009) Canberra distance on ranked lists. In: Adv ranking NIPS 09 Workshop, Whistler, Canada

  30. Jurman G, Riccadonna S, Visintainer R, Furlanello C (2012) Algebraic comparison of partial lists in bioinformatics. PLoS One 7(5):e36540

    Article  Google Scholar 

  31. Chen J, Li Y, Feng L (2012) A new weighted Spearman’s footrule as a mesaure of distance between rankings. In: 1207.2541.v2 [cs.DM]

  32. Bartholdi JJ III, Tovey CA, Trick MA (1989) Voting schemes for which it can be difficult to tell who won the election. Soc Choice Welfare 8(2):157–165

    Article  Google Scholar 

  33. Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the Web. In: WWW

  34. Ailon N (2007) Aggregation of partial rankings, p-ratings and top-m lists. In: SODA

  35. Sculley D. (2007) Rank aggregation for similar items. In: SDM

  36. Fang Q, Feng J, Ng W (2011) Identifying differentially-expressed genes via weighted rank aggregation. In: ICDM

  37. Liu Y-T, Liu T-Y, Qin T, Ma Z-M, Li H (2007) Supervised rank aggregation. In: WWW

  38. Klementiev A, Roth D, Small K (2008) Unsupervised rank aggregation with distance-based models. In: ICML

  39. Fagin R, Kumar R, Sivakumar D (2003) Efficient similarity search and classification via rank aggregation. In: SIGMOD

  40. Witten IH, Moffat A, Bell TC (1999) Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, Burlington

    Google Scholar 

  41. Sanders P, Transier F (2007) Intersection in integer inverted indices. In: ALENEX

  42. Mirzazadeh M. (2004) Adaptive comparison-based algorithms for evaluating set queries. Master’s thesis, University of Waterloo

  43. Bille P, Pagh A, Pagh R (2007) Fast evaluation of union-intersection expressions. In: ISAAC

  44. Blelloch GE, Reid-Miller M (1998) Fast set operations using treaps. In: SPAA

  45. Ding B, König AC (2011) Fast set intersection in memory. In: VLDB

  46. Shang S, Ding R, Bo Y, Xie K, Zheng K, Kalnis P (2012) User oriented trajectory search for trip recommendation. In: EDBT

  47. Cao X, Chen L, Cong G, Xiao X (2012) Keyword-aware optimal route search. In: PVLDB

  48. Cao X, Chen L, Cong G, Jensen CS, Qu Q, Skovsgaard A, Wu D, Yiu ML (2012) Spatial keyword querying. In: ER

  49. Cao X, Chen L, Cong G, Guan J, Phan N-T, Xiao X (2013) KORS: Keyword-aware optimal route search system. In: ICDE

  50. Han J, Wen J-R (2013) Mining frequent neighborhood patterns in a large labeled graph. In: CIKM

  51. Han J, Wen J-R, Pei J (2014) Within-network classification using radius-constrained neighborhood patterns. In: CIKM

  52. Han J, Zheng K, Sun A, Shang S, Wen J-R (2016) Discovering neighborhood pattern queries by sample answers in knowledge base. In: ICDE

  53. Shang S, Ding R, Zheng K, Jensen CS, Kalnis P, Zhou X (2014) Personalized trajectory matching in spatial networks. VLDB J 23(3):449–468

    Article  Google Scholar 

  54. Shang S, Chen L, Wei Z, Jensen CS, Wen J-R, Kalnis P (2016) Collective travel planning in spatial networks. TKDE 28(5):1132–1146

    Google Scholar 

  55. Shang S, Chen L, Jensen CS, Wen J-R, Kalnis P (2017) Searching trajectories by regions of interest. TKDE 29(7):1549–1562

    Google Scholar 

  56. Shang S, Chen L, Zheng K, Jensen CS, Wei Z, Kalnis P (2018) Parallel trajectory to location join. TKDE, online first

  57. Chen L, Cui Y, Cong G, Cao X (2014) SOPS: A system for efficient processing of spatial-keyword publish/subscribe. In: PVLDB

  58. Chen L, Cong G, Cao X, Tan K-L (2015) Temporal spatial-keyword top-k publish/subscribe. In: ICDE

  59. Chen L, Cong G (2015) Diversity-aware top-k publish/subscribe for text stream. In: SIGMOD

  60. Chen Z, Cong G, Zhang Z, Tom ZJ, Chen L (2017) Distributed publish/subscribe query processing on the spatio-textual data stream. In: ICDE

  61. Chen L, Shang S, Zhang Z, Cao X, Jensen CS, Kalnis P (2018) Location-aware top-k term publish/subscribe. In: ICDE

  62. Li M, Chen L, Cong G, Gu Y, Yu G (2016) Efficient processing of location-aware group preference queries. In: CIKM

  63. An L, Wang W, Shang S, Li Q, Zhang X (2018) Efficient task assignment in spatial crowdsourcing with worker and task privacy protection. GeoInformatica 22 (2):335–362

    Article  Google Scholar 

  64. Chen L, Cong G, Cao X (2013) An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD

  65. Zhao K, Liu Y, Yuan Q, Chen L, Chen Z, Cong G (2016) Towards personalized maps: mining user preferences from geo-textual data. In: PVLDB

  66. Li X, Cheng Y, Cong G, Chen L (2017) Discovering pollution sources and propagation patterns in urban area. In: KDD

  67. Zhao K, Chen L, Cong G (2016) Topic exploration in spatio-temporal document collections. In: SIGMOD

  68. Knuth DE (2009) Bitwise Tricks & Techniques; Binary Decision Diagrams, volume 4, fascicle 1 of The Art of Computer Programming, chapter 7 Addison-Wesley

  69. Wegner P (1960) A technique for counting ones in a binary computer. CACM 3 (5):322

    Article  Google Scholar 

  70. Tang J, Zhang D, Yao L (2007) Social network extraction of academic researchers. In: ICDM’07

  71. Tang J, Zhang J, Yao L, Li J, Li Z, Su Z (2008) Arnetminer: Extraction and mining of academic social networks. In: KDD

  72. Tang J, Yao L, Zhang D, Zhang J (2010) A combination approach to web user profiling. ACM TKDD 5(1):1–44

    Article  Google Scholar 

  73. Tang J, Zhang J, Jin R, Zi Y, Cai K, Li Z, Zhong S u (2011) Topic level expertise search over heterogeneous networks. Machine Learning Journal 82 (2):211–237

    Article  Google Scholar 

  74. Tang J, Fong ACM, Bo W, Zhang J (2012) A unified probabilistic framework for name disambiguation in digital library. TKDE 24(6):975–987

    Google Scholar 

  75. Goldberg K, Roeder T, Gupta D, Perkins C (2001) Eigentaste: a constant time collaborative filtering algorithm. J Inform Retrieval 4:133–151

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Lu, Z. Preference-aware sequence matching for location-based services. Geoinformatica 24, 107–131 (2020). https://doi.org/10.1007/s10707-019-00370-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-019-00370-1

Keywords

Navigation