Skip to main content
Log in

Enhancing local live tweet stream to detect news

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Twitter captures invaluable information about real-world news, spanning a wide scale from large national/international stories like a presidential election to small local stories such as a local farmers market. Detecting and extracting small news for a local place is a challenging problem and the focus of this work. The main challenge lies in identifying these small stories that correspond to a local area of interest, which are typically harder to detect compared to national stories in the sense that there may be just a handful of tweets about a local story. A system, called Firefly, is proposed that overcomes the data sparsity and captures thousands of local stories per day from a metropolitan area (e.g., Boston). The key idea lies in combining the enhancement of a local live tweet stream in Twitter, the identification of “locality-aware” keywords, and using these keywords to cluster tweets. Experiments show that the proposed system has a significantly higher recall over a set of representative local news agencies, and at the same time, outperforms the baseline approach TwitterStand. More importantly, the results also demonstrate that our system, by utilizing the enhanced local live tweet stream, discovers much more local news than the methods working only on geotagged tweets, i.e., those with embedded GPS coordinate values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. http://www.census.gov/popest/about/terms.html

  2. https://www.statista.com/statistics/274564/monthly-active-twitter-users-in-the-united-states/

  3. http://geonames.org/

  4. https://dev.twitter.com/streaming/reference/post/status/filter

  5. https://dev.twitter.com/streaming/reference/get/statuses/sample

  6. https://support.twitter.com/articles/164083

  7. https://twitter.com/bostonnewslocal/timelines/878280225074950144

  8. Multiple API tokens are used because one only follows up to 5000 users.

References

  1. Sankaranarayanan J, Samet H, Teitler BE et al TwitterStand: news in tweets. SIGSPATIAL ’09

  2. Compton R, Jurgens D, Allen D Geotagging one hundred million Twitter accounts with total variation minimization. volume abs/1404.7152

  3. Kwan E, Hsu P-L, Liang J-H et al Event identification for social streams using keyword-based evolving graph sequences

  4. Mathioudakis M, Koudas N TwitterMonitor: trend detection over the Twitter stream. SIGMOD ’10

  5. Wei H, Sankaranarayanan J, Samet H Finding and tracking local Twitter users for news detection. SIGSPATIAL ’17

  6. Krumm J, Horvitz E Eyewitness: identifying local events via space-time signals in Twitter feeds. SIGSPATIAL ’15

  7. Zhang C, Zhou G, Yuan Q et al GeoBurst: real-time local event detection in geo-tagged tweet streams. SIGIR ’16

  8. Atefeh F, Khreich W (2015) A survey of techniques for event detection in Twitter. Comput Intell 31(1):132–164

    Article  Google Scholar 

  9. Abdelhaq H (2015) Localized events in social media streams: detection, tracking, and recommendation, Heidelberg University, PhD thesis

  10. Li Q, Nourbakhsh A, Shah S et al Real-time novel event detection from social media. ICDE ’17

  11. Zhang C, Liu L, Lei D et al TrioVecEvent: embedding-based online local event detection in geo-tagged tweet streams. KDD ’17

  12. Walther M, Kaisser M Geo-spatial event detection in the Twitter stream. ECIR ’13

  13. Boettcher A, Lee D EventRadar: a real-time local event detection scheme using Twitter stream. GreenCom ’12

  14. Hong L, Ahmed A, Gurumurthy S et al Discovering geographical topics in the Twitter stream. WWW ’12

  15. Zhou X, Chen L Event detection over Twitter social media streams, vol 23

  16. Wei W, Joseph K, Lo W et al A Bayesian graphical model to discover latent events from Twitter. ICWSM ’15

  17. Skovsgaard A, Sidlauskas D, Jensen CS Scalable top-k spatio-temporal term querying. ICDE ’14

  18. Abdelhaq H, Sengstock C, Gertz M EvenTweet: online localized event detection from Twitter. Volume 6 of PVLDB ’13

  19. Kamath KY, Caverlee J, Lee K et al Spatio-temporal dynamics of online memes: a study of geo-tagged tweets. WWW ’13

  20. Watanabe K, Ochi M, Okabe M et al Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs. CIKM ’11

  21. Jonathan C, Magdy A, Mokbel MF et al GARNET: a holistic system approach for trending queries in microblogs. ICDE ’16

  22. Kang W, Tung AKH, Zhao F et al Interactive hierarchical tag clouds for summarizing spatiotemporal social contents. ICDE ’14

  23. Magdy A, Mokbel MF, Elnikety S et al Mercury: a memory-constrained spatio-temporal real-time search on microblogs. ICDE ’14

  24. Magdy A, Aly AM, Mokbel MF et al GeoTrend: spatial trending queries on real-time microblogs. SIGSPATIAL ’16

  25. Xu J-M, Bhargava A, Nowak R et al Socioscope: spatio-temporal signal recovery from social media. ECML PKDD ’12

  26. Lappas T, Vieira MR, Gunopulos D et al On the spatiotemporal burstiness of terms. PVLDB ’12. VLDB endowment

  27. He Q, Chang K, Lim E-P Analyzing feature trajectories for event detection. SIGIR ’07

  28. Budak C, Georgiou T, Agrawal D et al GeoScope: online detection of geo-correlated information trends in social networks. PVLDB ’13. VLDB endowment

  29. Liu Z, Huang Y, Trampier JR LEDS: local event discovery and summarization from tweets. SIGSPATIAL ’16

  30. Valkanas G, Gunopulos D How the live web feels about events. CIKM ’13

  31. Roller S, Speriosu M, Rallapalli S et al Supervised text-based geolocation using language models on an adaptive grid. EMNLP-CoNLL ’12

  32. Marcus A, Bernstein MS, Badar O et al Twitinfo: aggregating and visualizing microblogs for event exploration. CHI ’11

  33. Quezada M, Peña Araya V, Poblete B Location-aware model for news events in social media. SIGIR ’15

  34. Li R, Lei KH, Khadiwala R et al TEDAS: a Twitter-based event detection and analysis system. ICDE ’12

  35. Yamaguchi Y, Amagasa T, Kitagawa H Landmark-based user location inference in social media. COSN ’13

  36. Davis CA Jr, Pappa GL, de Oliveira DRR, et al. (2011) Inferring the location of Twitter messages based on user relationships. Trans GIS 15(6):735–751

    Article  Google Scholar 

  37. Sadilek A, Kautz H, Bigham JP Finding your friends and following them to where you are. WSDM ’12

  38. Chen Y, Zhao J, Hu X et al From interest to function: location estimation in social media. AAAI ’13. AAAI Press

  39. Cheng Z, Caverlee J, Lee K You are where you tweet: a content-based approach to geo-locating Twitter users. CIKM ’10

  40. Mahmud J, Nichols J, Drews C (2014) Home location identification of Twitter users. ACM TIST 5(3):47,1–47,21

    Google Scholar 

  41. Han B, Cook P, Baldwin T (2014) Text-based Twitter user geolocation prediction. J AIR 49(1):451–500

    Google Scholar 

  42. Dalvi N, Kumar R, Pang B Object matching in tweets with spatial models. WSDM ’12

  43. Li G, Hu J, Feng J et al Effective location identification from microblogs. ICDE ’14

  44. Sakaki T, Okazaki M, Matsuo Y Earthquake shakes Twitter users: real-time event detection by social sensors

  45. Weng J, Lee B-S Event detection in Twitter. ICWSM ’11

  46. Albakour M-D, Macdonald C, Ounis I Identifying local events by using microblogs as social sensors

  47. Takhteyev Y, Gruzd A, Wellman B (2012) Geography of Twitter networks. Social Networks 34(1):73–81. Capturing context: integrating spatial and social network analyses

    Article  Google Scholar 

  48. Mok D, Wellman B, Carrasco J (2010) Does distance matter in the age of the internet? Urban Stud 47(13):2747–2783

    Article  Google Scholar 

  49. Vardi Y, Zhang C-H (2000) The multivariate L1-median and associated data depth. Proc NAS 97(4):1423–1426

    Article  Google Scholar 

  50. Zaharia M, Chowdhury M, Das T et al Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI ’12

  51. Dave A IndexedRDD: efficient fine-grained updates for RDDs

  52. Zaharia M, Chowdhury M, Franklin MJ et al Spark: cluster computing with working sets. HotCloud ’10

  53. Zaharia M, Das T, Li H et al Discretized streams: fault-tolerant streaming computation at scale. SOSP ’13

  54. McMinn AJ, Moshfeghi Y, Jose JM Building a large-scale corpus for evaluating event detection on Twitter

  55. Zheng Y, Zhang H, Yu Y Detecting collective anomalies from multiple spatio-temporal datasets across different domains

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Wei.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei, H., Sankaranarayanan, J. & Samet, H. Enhancing local live tweet stream to detect news. Geoinformatica 24, 411–441 (2020). https://doi.org/10.1007/s10707-019-00392-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-019-00392-9

Keywords

Navigation