Preference-aware sequence matching for location-based services

Wang, Hao; Lu, Ziyu

doi:10.1007/s10707-019-00370-1

Preference-aware sequence matching for location-based services

Published: 21 June 2019

Volume 24, pages 107–131, (2020)
Cite this article

GeoInformatica Aims and scope Submit manuscript

Hao Wang¹ &
Ziyu Lu²

375 Accesses
7 Citations
Explore all metrics

Abstract

Sequantial data are important in many real world location based services. In this paper, we study the problem of sequence matching. Specifically, we want to identify the sequences most similar to a given sequence, under three most commonly used preferece-aware similarity measures, i.e., Fagin’s intersection metric, Kendall’s tau, and Spearman’s footrule. We first analyze the properties of these three preference-aware similarity measures, revealing the connection between them and set intersection. Then, we build an index structure, which is essentially a doubly linked list, to facilitate efficient sequence matching. Lower- and upper-bounds are derived to achieve support prefix-based filtering. Experiments on various datasets show that our proposed method outperforms the baselines by a large margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similarity-based probabilistic category-based location recommendation utilizing temporal and geographical influence

Article 10 June 2016

Dequan Zhou, Seyyed Mohammadreza Rahimi & Xin Wang

Trajectory Similarity Search with Multi-level Semantics

Similarity Search on Uncertain Spatio-temporal Data

Notes

Some other work may simply denote \(\mathcal {U}\) as the set {1, 2,⋯ ,n} (see for example [5]). These two representations are equivalent in the sense that we can assign a distinct integer ID to each item. In this work, we choose to avoid using such integer IDs so that to minimize any possible confusion between items and their ranks.
For the sake of clarity, we overload the symbol F to compute the distance between top-m lists and suspend the use of the asterisk (∗) which indicates the Hausdorff nature. In addition, F^∗ defined in Eq. 2 is a metric whereas F in Eq. 3 is NOT. This is because in Eq. 3 the universe \(\mathcal {U}\) is considered as π ∪ σ, which is not true in general. However, compared to the (true) Hausdorff distance F^∗, F in Eq. 3 is preferable in some aspects. For example, consider again σ₁ and σ₂ the top-3 lists of fruits in Section 3.1: it is more intuitive to compute the distance based merely on what they have in their lists; it is less intuitive (although somehow still makes sense) to alter the distance value whenever, say, there is a new fruit appended into the universe.
Those items outside of π ∩ σ are not important, for they have no impact on the distance value. Therefore in this sense it doesn’t matter whether or not σ^′∖ π = σ ∖ π.
Note that ℓ is well-defined since σ(v₀) − 0 = 0 ≤ z − s and σ(v_s+ 1) − (s + 1) = m − s ≥ z − s.
The number of 1’s in a binary string is known as the Hamming weight of that string. It can be efficiently computed using one of the bitwise tricks named sidewaysaddition [68]. If the operand y is expected to be sparse (i.e., y contains merely a few number of 1’s), then the sideways addition can be done by keeping doing y ← y & (y − 1) until y = 0 [69].
This DBLP citation network is publicly available at http://arnetminer.org/citation
The Jester dataset is available at http://www.ieor.berkeley.edu/~goldberg/jester-data/

References

Rentfrow PJ, Gosling SD (2003) The do re mi’s of everyday life: the structure and personality correlates of music preferences. J Pers Soc Psychol 84(6):1236–1256
Article Google Scholar
Chausson O Assessing the impact of gender and personality on film preferences. Technical report, University of Cambridge, 2010. myPersonality Project
Cantador I, Ferández-Tobías I, Bellogín A (2013) Relating personality types with user preferences in multiple entertainment domains. In: EMPIRE
Diaconis P, Graham RL (1977) Spearman’s footrule as a measure of disarray. J Royal Statistical Soc Series B (Methodol) 39(2):262–268
Google Scholar
Douglas E (1984) Critchlow. Metric methods for analyzing partially ranked data. Technical Report 225, Dept of Statistics, Stanford University
Salama IA, Quade D (1990) A note on spearman’s footrule. Comm Statistics 19(2):591–601
Article Google Scholar
Fagin R, Kumar R, Sivakumar D (2003) Comparing top-k lists. SIAM J Discrete Math 17(1):134–160
Article Google Scholar
Wu S, Crestani F (2003) Methods for ranking information retrieval systems without relevance judgements. In: SAC
Webber W, Moffat A, Zobel J (2010) A similarity measure for indefinite rankings. TOIS 28(4):1–34
Article Google Scholar
Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. TKDE 17(6):734–749
Google Scholar
Konstas I, Stathopoulos V, Jose JM (2009) On social networks and collaborative recommendation. In: SIGIR
Shang S, Chen L, Wei Z, Jensen CS, Zheng K, Kalnis P (2017) Trajectory similarity join in spatial networks. In: PVLDB
Yue X, Xi M, Chen B, Gao M, He Y, Xu J (2019) A revocable group signatures scheme to provide privacy-preserving authentications. Mobile Networks and Applications
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD
Pal K, Michel S (2016) Efficient similarity search across top-k lists under the Kendall’s tau distance. In: SSDMB2016
Berchtold S, Ertl B, Keim DA, Kriegel H-P, Seidl T (1998) Fast nearest neighbor search in high-dimensional space. In: ICDE
Roussopoulos N, Kelly S, Vincent F (1995) eRic Nearest neighbor queries. In: KDD
Hjaltason GR, Samet H (1999) Distance browsing in spatial databases. TODS 24(2):265–318
Article Google Scholar
Sharifzadeh M, Shahabi C (2010) Vor-tree: R-trees with Voronoi diagrams for efficient processing of spatial nearest neighbor queries. PVLDB 3(1-2):1231–1242
Google Scholar
Liu T, Moore AW, Gray A (2006) New algorithms for efficient high-dimensional nonparametric classification. JMLR 7:1135–1158
Google Scholar
Sproull RF (1991) Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica 6:579–589
Article Google Scholar
Beygelzimer A, Kakade S, Langford J (2006) Cover trees for nearest neighbors. In: ICML
Filho RFS, Traina A, Traina C Jr., Faloutsos C (2001) Similarity search without tears: the OMNI-family of all-purpose access methods. In: ICDE
Jagadish HV, Ooi BC, Tan K-L, Yu C, Zhang R (2005) idistance: an adaptive b⁺-tree based indexing method for nearest neighbor search. TODS 30(2):364–397
Article Google Scholar
Venkateswaran J, Lachwani D, Kahveci T, Jermaine C (2006) Reference-based indexing of sequence databases. In: VLDB
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72–101
Article Google Scholar
Kendall M (1948) Rank correlation methods charles griffin and co.
Jurman G, Merler S, Barla A, Paoli S, Galea A, Furlanello C (2008) Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics 24 (2):258–264
Article Google Scholar
Jurman G, Riccadonna S, Visintainer R, Furlanello C (2009) Canberra distance on ranked lists. In: Adv ranking NIPS 09 Workshop, Whistler, Canada
Jurman G, Riccadonna S, Visintainer R, Furlanello C (2012) Algebraic comparison of partial lists in bioinformatics. PLoS One 7(5):e36540
Article Google Scholar
Chen J, Li Y, Feng L (2012) A new weighted Spearman’s footrule as a mesaure of distance between rankings. In: 1207.2541.v2 [cs.DM]
Bartholdi JJ III, Tovey CA, Trick MA (1989) Voting schemes for which it can be difficult to tell who won the election. Soc Choice Welfare 8(2):157–165
Article Google Scholar
Dwork C, Kumar R, Naor M, Sivakumar D (2001) Rank aggregation methods for the Web. In: WWW
Ailon N (2007) Aggregation of partial rankings, p-ratings and top-m lists. In: SODA
Sculley D. (2007) Rank aggregation for similar items. In: SDM
Fang Q, Feng J, Ng W (2011) Identifying differentially-expressed genes via weighted rank aggregation. In: ICDM
Liu Y-T, Liu T-Y, Qin T, Ma Z-M, Li H (2007) Supervised rank aggregation. In: WWW
Klementiev A, Roth D, Small K (2008) Unsupervised rank aggregation with distance-based models. In: ICML
Fagin R, Kumar R, Sivakumar D (2003) Efficient similarity search and classification via rank aggregation. In: SIGMOD
Witten IH, Moffat A, Bell TC (1999) Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, Burlington
Google Scholar
Sanders P, Transier F (2007) Intersection in integer inverted indices. In: ALENEX
Mirzazadeh M. (2004) Adaptive comparison-based algorithms for evaluating set queries. Master’s thesis, University of Waterloo
Bille P, Pagh A, Pagh R (2007) Fast evaluation of union-intersection expressions. In: ISAAC
Blelloch GE, Reid-Miller M (1998) Fast set operations using treaps. In: SPAA
Ding B, König AC (2011) Fast set intersection in memory. In: VLDB
Shang S, Ding R, Bo Y, Xie K, Zheng K, Kalnis P (2012) User oriented trajectory search for trip recommendation. In: EDBT
Cao X, Chen L, Cong G, Xiao X (2012) Keyword-aware optimal route search. In: PVLDB
Cao X, Chen L, Cong G, Jensen CS, Qu Q, Skovsgaard A, Wu D, Yiu ML (2012) Spatial keyword querying. In: ER
Cao X, Chen L, Cong G, Guan J, Phan N-T, Xiao X (2013) KORS: Keyword-aware optimal route search system. In: ICDE
Han J, Wen J-R (2013) Mining frequent neighborhood patterns in a large labeled graph. In: CIKM
Han J, Wen J-R, Pei J (2014) Within-network classification using radius-constrained neighborhood patterns. In: CIKM
Han J, Zheng K, Sun A, Shang S, Wen J-R (2016) Discovering neighborhood pattern queries by sample answers in knowledge base. In: ICDE
Shang S, Ding R, Zheng K, Jensen CS, Kalnis P, Zhou X (2014) Personalized trajectory matching in spatial networks. VLDB J 23(3):449–468
Article Google Scholar
Shang S, Chen L, Wei Z, Jensen CS, Wen J-R, Kalnis P (2016) Collective travel planning in spatial networks. TKDE 28(5):1132–1146
Google Scholar
Shang S, Chen L, Jensen CS, Wen J-R, Kalnis P (2017) Searching trajectories by regions of interest. TKDE 29(7):1549–1562
Google Scholar
Shang S, Chen L, Zheng K, Jensen CS, Wei Z, Kalnis P (2018) Parallel trajectory to location join. TKDE, online first
Chen L, Cui Y, Cong G, Cao X (2014) SOPS: A system for efficient processing of spatial-keyword publish/subscribe. In: PVLDB
Chen L, Cong G, Cao X, Tan K-L (2015) Temporal spatial-keyword top-k publish/subscribe. In: ICDE
Chen L, Cong G (2015) Diversity-aware top-k publish/subscribe for text stream. In: SIGMOD
Chen Z, Cong G, Zhang Z, Tom ZJ, Chen L (2017) Distributed publish/subscribe query processing on the spatio-textual data stream. In: ICDE
Chen L, Shang S, Zhang Z, Cao X, Jensen CS, Kalnis P (2018) Location-aware top-k term publish/subscribe. In: ICDE
Li M, Chen L, Cong G, Gu Y, Yu G (2016) Efficient processing of location-aware group preference queries. In: CIKM
An L, Wang W, Shang S, Li Q, Zhang X (2018) Efficient task assignment in spatial crowdsourcing with worker and task privacy protection. GeoInformatica 22 (2):335–362
Article Google Scholar
Chen L, Cong G, Cao X (2013) An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD
Zhao K, Liu Y, Yuan Q, Chen L, Chen Z, Cong G (2016) Towards personalized maps: mining user preferences from geo-textual data. In: PVLDB
Li X, Cheng Y, Cong G, Chen L (2017) Discovering pollution sources and propagation patterns in urban area. In: KDD
Zhao K, Chen L, Cong G (2016) Topic exploration in spatio-temporal document collections. In: SIGMOD
Knuth DE (2009) Bitwise Tricks & Techniques; Binary Decision Diagrams, volume 4, fascicle 1 of The Art of Computer Programming, chapter 7 Addison-Wesley
Wegner P (1960) A technique for counting ones in a binary computer. CACM 3 (5):322
Article Google Scholar
Tang J, Zhang D, Yao L (2007) Social network extraction of academic researchers. In: ICDM’07
Tang J, Zhang J, Yao L, Li J, Li Z, Su Z (2008) Arnetminer: Extraction and mining of academic social networks. In: KDD
Tang J, Yao L, Zhang D, Zhang J (2010) A combination approach to web user profiling. ACM TKDD 5(1):1–44
Article Google Scholar
Tang J, Zhang J, Jin R, Zi Y, Cai K, Li Z, Zhong S u (2011) Topic level expertise search over heterogeneous networks. Machine Learning Journal 82 (2):211–237
Article Google Scholar
Tang J, Fong ACM, Bo W, Zhang J (2012) A unified probabilistic framework for name disambiguation in digital library. TKDE 24(6):975–987
Google Scholar
Goldberg K, Roeder T, Gupta D, Perkins C (2001) Eigentaste: a constant time collaborative filtering algorithm. J Inform Retrieval 4:133–151
Article Google Scholar

Download references

Author information

Authors and Affiliations

Inception Institute of Artificial Intelligence, Abu Dhabi, UAE
Hao Wang
Central University of Finance and Economics, Beijing, China
Ziyu Lu

Authors

Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ziyu Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Lu, Z. Preference-aware sequence matching for location-based services. Geoinformatica 24, 107–131 (2020). https://doi.org/10.1007/s10707-019-00370-1

Download citation

Received: 30 March 2019
Revised: 25 April 2019
Accepted: 13 May 2019
Published: 21 June 2019
Issue Date: January 2020
DOI: https://doi.org/10.1007/s10707-019-00370-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preference-aware sequence matching for location-based services

Abstract

Access this article

Similar content being viewed by others

Similarity-based probabilistic category-based location recommendation utilizing temporal and geographical influence

Trajectory Similarity Search with Multi-level Semantics

Similarity Search on Uncertain Spatio-temporal Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Preference-aware sequence matching for location-based services

Abstract

Access this article

Similar content being viewed by others

Similarity-based probabilistic category-based location recommendation utilizing temporal and geographical influence

Trajectory Similarity Search with Multi-level Semantics

Similarity Search on Uncertain Spatio-temporal Data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation