Skip to main content
Log in

Top-k term publish/subscribe for geo-textual data streams

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Massive amounts of data that contain spatial, textual, and temporal information are being generated at a rapid pace. With streams of such data, which includes check-ins and geo-tagged tweets, available, users may be interested in being kept up-to-date on which terms are popular in the streams in a particular region of space. To enable this functionality, we aim at efficiently processing two types of general top-k term subscriptions over streams of spatio-temporal documents: region-based top-k spatial-temporal term (RST) subscriptions and similarity-based top-k spatio-temporal term (SST) subscriptions. RST subscriptions continuously maintain the top-k most popular trending terms within a user-defined region. SST subscriptions free users from defining a region and maintain top-k locally popular terms based on a ranking function that combines term frequency, term recency, and term proximity. To solve the problem, we propose solutions that are capable of supporting real-life location-based publish/subscribe applications that process large numbers of SST and RST subscriptions over a realistic stream of spatio-temporal documents. The performance of our proposed solutions is studied in extensive experiments using two spatio-temporal datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29
Fig. 30

Similar content being viewed by others

Notes

  1. We say a subscription matches a term if the term is a top-k result of the subscription.

  2. The complexity is O(1) when the SF score is computed based on Euclidean distance or network distance with pre-computation of pair distances.

  3. We do not index \(d_4\) because it does not contain \(w_3\).

  4. Parameter M is set to 16–128 in experiments (cf. Sect. 5).

  5. http://lisi.io.

  6. The performance discrepancy between baseline and TS is negligible when k is small. Thus, we only report the result of baseline when varying k.

  7. http://www.cs.utah.edu/~lifeifei/research/tpq/.

References

  1. Abdelhaq, H., Gertz, M.: On the locality of keywords in twitter streams. In: IWGS, pp. 12–20 (2014)

  2. Abdelhaq, H., Gertz, M., Armiti, A.: Efficient online extraction of keywords for localized events in twitter. GeoInformatica 21(2), 365–388 (2017)

    Article  Google Scholar 

  3. Ahmed, P., Hasan, M., Kashyap, A., Hristidis, V., Tsotras, V.J.: Efficient computation of top-k frequent terms over spatio-temporal ranges. In: SIGMOD, pp. 1227–1241 (2017)

  4. Altinel, M., Franklin, M.J.: Efficient filtering of xml documents for selective dissemination of information. In: VLDB, pp. 53–64 (2000)

  5. Amati, G., Amodeo, G., Gaibisso, C.: Survival analysis for freshness in microblogging search. In: CIKM, pp. 2483–2486. ACM, New York (2012)

  6. Anick, P.G.: Using terminological feedback for web search refinement: a log-based study. In: SIGIR, pp. 88–95 (2003)

  7. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS, pp. 1–16 (2002)

  8. Cao, X., Chen, L., Cong, G., Xiao, X.: Keyword-aware optimal route search. PVLDB 5(11), 1136–1147 (2012)

    Google Scholar 

  9. Chen, L., Cong, G.: Diversity-aware top-k publish/subscribe for text stream. In: SIGMOD, pp. 347–362 (2015)

  10. Chen, L., Cong, G., Cao, X.: An efficient query indexing mechanism for filtering geo-textual data. In: SIGMOD, pp. 749–760 (2013)

  11. Chen, L., Cong, G., Cao, X., Tan, K.: Temporal spatial-keyword top-k publish/subscribe. In: ICDE, pp. 255–266 (2015)

  12. Chen, L., Shang, S.: Approximate spatio-temporal top-k publish/subscribe. World Wide Web 22(5), 2153–2175 (2019)

    Article  Google Scholar 

  13. Chen, L., Shang, S.: Region-based message exploration over spatio-temporal data streams. In: AAAI, pp. 873–880 (2019)

  14. Chen, L., Shang, S., Jensen, C.S., Yao, B., Zhang, Z., Shao, L.: Effective and efficient reuse of past travel behavior for route recommendation. In: KDD, pp. 488–498 (2019)

  15. Chen, L., Shang, S., Yang, C., Li, J.: Spatial keyword search: a survey. GeoInformatica 24(1), 85–106 (2020)

    Article  Google Scholar 

  16. Chen, L., Shang, S., Yao, B., Zheng, K.: Spatio-temporal top-k term search over sliding window. World Wide Web 22(5), 1953–1970 (2019)

    Article  Google Scholar 

  17. Chen, L., Shang, S., Zhang, Z., Cao, X., Jensen, C.S., Kalnis, P.: Location-aware top-k term publish/subscribe. In: ICDE, pp. 749–760 (2018)

  18. Chen, L., Shang, S., Zheng, K., Kalnis, P.: Cluster-based subscription matching for geo-textual data streams. In: ICDE, pp. 890–901 (2019)

  19. Chen, Z., Cong, G., Zhang, Z., Fuz, T.Z., Chen, L.: Distributed publish/subscribe query processing on the spatio-textual data stream. In: ICDE, pp. 1095–1106 (2017)

  20. Diao, Y., Fischer, P.M., Franklin, M.J., Yfilter, R. To.: Efficient and scalable filtering of XML documents. In: ICDE, pp. 341–342 (2002)

  21. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  22. Efron, M., Golovchinsky, G.: Estimation methods for ranking recent information. In: SIGIR, pp. 495–504. ACM, New York (2011)

  23. Farzindar, A., Khreich, W.: A survey of techniques for event detection in twitter. Comput. Intell. 31(1), 132–164 (2015)

    Article  MathSciNet  Google Scholar 

  24. Guo, D., Zhu, Y., Xu, W., Shang, S., Ding, Z.: How to find appropriate automobile exhibition halls: towards a personalized recommendation service for auto show. Neurocomputing 213, 95–101 (2016)

    Article  Google Scholar 

  25. Guo, L., Zhang, D., Li, G., Tan, K., Bao, Z.: Location-aware pub/sub system: When continuous moving queries meet dynamic event streams. In: SIGMOD, pp. 843–857 (2015)

  26. Haghani, P., Michel, S., Aberer, K.: The gist of everything new: Personalized top-k processing over web 2.0 streams. In: CIKM, pp. 489–498 (2010)

  27. He, Q., Chang, K., Lim, E., Zhang, J.: Bursty feature representation for clustering text streams. In: SDM, pp. 491–496, (2007)

  28. Hu, H., Liu, Y., Li, G., Feng, J., Tan, K.: A location-aware publish/subscribe framework for parameterized spatio-textual subscriptions. In: ICDE, pp. 711–722 (2015)

  29. Hu, J., Cheng, R., Wu, D., Jin, B.: Efficient top-k subscription matching for location-aware publish/subscribe. In: SSTD, pp. 333–351 (2015)

  30. Hu, M., Liu, S., Wei, F., Wu, Y., Stasko, J.T., Ma, K.: Breaking news on twitter. In: CHI Conference on Human Factors in Computing Systems, CHI ’12, Austin, TX, USA–May 05–10, 2012, pp. 2751–2754 (2012)

  31. Jonathan, C., Magdy, A., Mokbel, M.F., Jonathan, A.: GARNET: A holistic system approach for trending queries in microblogs. In: ICDE, pp. 1251–1262 (2016)

  32. Kwak, H., Lee, C., Park, H., Moon, S.B.: What is twitter, a social network or a news media? In: WWW, pp. 591–600 (2010)

  33. Li, G., Wang, Y., Wang, T., Feng, J.: Location-aware publish/subscribe. In: KDD, pp. 802–810 (2013)

  34. Li, X., Croft, W.B.: Time-based language models. In: CIKM, pp. 469–475. ACM, New York (2003)

  35. Liang, H., Xu, Y., Tjondronegoro, D., Christen, P.: Time-aware topic recommendation based on micro-blogs. In: CIKM, pp. 1657–1661 (2012)

  36. Magdy, A., Abdelhafeez, L., Kang, Y., Ong, E., Mokbel, M.F.: Microblogs data management: a survey. VLDB J. pp. 1–40 (2019)

  37. Magdy, A., Aly, A.M., Mokbel, M.F., Elnikety, S., He, Y., Nath, S., Aref. W.G.: Spatial trending queries on real-time microblogs. In: SIGSPATIAL, pp. 7:1–7:10 (2016)

  38. Mahmood, A.R., Aly, A.M., Aref. W.G.: FAST: frequency-aware indexing for spatio-textual data streams. In: ICDE, pp. 305–316 (2018)

  39. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB, pp. 346–357 (2002)

  40. Mathioudakis, M., Bansal, N., Koudas, N.: Identifying, attributing and describing spatial bursts. PVLDB 3(1), 1091–1102 (2010)

    Google Scholar 

  41. Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: SIGMOD, pp. 1155–1158 (2010)

  42. Metwally, A., Agrawal, D., El Abbadi, A.: Efficient computation of frequent and top-k elements in data streams. In: ICDT, pp. 398–412 (2005)

  43. Mokbel, M.F., Magdy, A.: Microblogs data management systems: querying, analysis, and visualization. In: SIGMOD, pp. 2219–2222 (2016)

  44. Pripuzic, K., Zarko, I.P., Aberer, K.: Top-k/w publish/subscribe: finding k most relevant publications in sliding time window w. In: DEBS, pp. 127–138 (2008)

  45. Samet, H.: The quadtree and related hierarchical data structures. ACM Comput. Surv. 16(2), 187–260 (1984)

    Article  MathSciNet  Google Scholar 

  46. Shang, S., Chen, L., Jensen, C.S., Wen, J., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng. 29(7), 1549–1562 (2017)

    Article  Google Scholar 

  47. Shang, S., Chen, L., Wei, Z., Jensen, C.S., Wen, J., Kalnis, P.: Collective travel planning in spatial networks. IEEE Trans. Knowl. Data Eng. 28(5), 1132–1146 (2016)

    Article  Google Scholar 

  48. Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Trajectory similarity join in spatial networks. PVLDB 10(11), 1178–1189 (2017)

    Google Scholar 

  49. Shang, S., Chen, L., Wei, Z., Jensen, C.S., Zheng, K., Kalnis, P.: Parallel trajectory similarity joins in spatial networks. VLDB J. 27(3), 395–420 (2018)

    Article  Google Scholar 

  50. Shang, S., Chen, L., Zheng, K., Jensen, C.S., Wei, Z., Kalnis, P.: Parallel trajectory-to-location join. IEEE Trans. Knowl. Data Eng. 31(6), 1194–1207 (2019)

    Article  Google Scholar 

  51. Shang, S., Ding, R., Zheng, K., Jensen, C.S., Kalnis, P., Zhou, X.: Personalized trajectory matching in spatial networks. VLDB J. 23(3), 449–468 (2014)

    Article  Google Scholar 

  52. Shang, S., Liu, J., Zheng, K., Lu, H., Pedersen, T.B., Wen, J.: Planning unobstructed paths in traffic-aware spatial networks. GeoInformatica 19(4), 723–746 (2015)

    Article  Google Scholar 

  53. Shang, S., Lu, H., Pedersen, T.B., Xie, X.: Finding traffic-aware fastest paths in spatial networks. In: SSTD, pp. 128–145 (2013)

  54. Shang, S., Lu, H., Pedersen, T.B., Xie, X.: Modeling of traffic-aware travel time in spatial networks. In: MDM, pp. 247–250 (2013)

  55. Shraer, A., Gurevich, M., Fontoura, M., Josifovski, V.: Top-k publish-subscribe for social annotation of news. PVLDB 6(6), 385–396 (2013)

    Google Scholar 

  56. Skovsgaard, A., Sidlauskas, D., Jensen, C.S.: Scalable top-k spatio-temporal term querying. In: ICDE, pp. 148–159 (2014)

  57. Sloan, L., Morgan, J.: Who tweets with their location? Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on twitter. PLoS ONE 10(11), e0142209 (2015)

    Article  Google Scholar 

  58. Van, L. H., Takasu, A.: Parallelizing top-k frequent spatiotemporal terms computation on key-value stores. In: SIGSPATIAL, pp. 476–479 (2018)

  59. Wang, X., Zhang, Y., Zhang, W., Lin, X.: Efficient identification of local keyword patterns in microblogging platforms. IEEE Trans. Knowl. Data Eng. 28(10), 2621–2634 (2016)

    Article  Google Scholar 

  60. Wang, X., Zhang, Y., Zhang, W., Lin, X., Huang, Z.: SKYPE: top-k spatial-keyword publish/subscribe over sliding window. PVLDB 9(7), 588–599 (2016)

    Google Scholar 

  61. Wang, X., Zhang, Y., Zhang, W., Lin, X., Wang, W.: Ap-tree: Efficiently support continuous spatial-keyword queries over stream. In: ICDE, pp. 1107–1118 (2015)

  62. Wang, Y., Li, J., Zhong, Y., Zhu, S., Guo, D., Shang, S.: Discovery of accessible locations using region-based geo-social data. World Wide Web 22(3), 929–944 (2019)

    Article  Google Scholar 

  63. Xiong, X., Mokbel, M.F., Aref, W.G.: Sea-cnn: Scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases. In: ICDE, pp. 643–654 (2005)

  64. Xu, Y., Chen, L., Yao, B., Shang, S., Zhu, S., Zheng, K., Li, F.: Location-based top-k term querying over sliding window. In: WISE, pp. 299–314 (2017)

  65. Xu, Y., Wang, K., Zhang, B., Chen, Z.: Privacy-enhancing personalized web search. In: WWW, pp. 591–600 (2007)

  66. Yang, C., Chen, L., Shang, S., Zhu, F., Liu, L., Shao, L.: Toward efficient navigation of massive-scale geo-textual streams. In: IJCAI, pp. 4838–4845 (2019)

  67. Yu, M., Li, G., Feng, J.: A cost-based method for location-aware publish/subscribe services. In: CIKM, pp. 693–702 (2015)

  68. Yu, M., Li, G., Wang, T., Feng, J., Gong, Z.: Efficient filtering algorithms for location-aware publish/subscribe. IEEE Trans. Knowl. Data Eng. 27(4), 950–963 (2015)

    Article  Google Scholar 

  69. Zhao, K., Chen, L., Cong, G.: Topic exploration in spatio-temporal document collections. In: SIGMOD, pp. 985–998 (2016)

  70. Zhao, K., Liu, Y., Yuan, Q., Chen, L., Chen, Z., Cong, G.: Towards personalized maps: mining user preferences from geo-textual data. PVLDB 9(13), 1545–1548 (2016)

    Google Scholar 

  71. Zhao, Y., Shang, S., Wang, Y., Zheng, B., Nguyen, Q.V.H., Zheng, K.: REST: A reference-based framework for spatio-temporal trajectory compression. In: KDD, pp. 2797–2806 (2018)

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (61932004, 61922054, 61872235, 61729202, 61832017, U1636210), the National Key Research and Development Program of China (2018YFC1504504, 2016YFB0700502), and Hong Kong RGC Grant 12201018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuo Shang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Shang, S., Jensen, C.S. et al. Top-k term publish/subscribe for geo-textual data streams. The VLDB Journal 29, 1101–1128 (2020). https://doi.org/10.1007/s00778-020-00607-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-020-00607-8

Keywords

Navigation