Abstract
Social networks like Twitter, Facebook have recently become the most widely used communication platforms for people to propagate information rapidly. Fast diffusion of information creates accuracy and scalability issues towards topic detection. Most of the existing approaches can detect the most popular topics on a large scale. However, these approaches are not effective for faster detection. This article proposes a novel topic detection approach – Node Significance based Label Propagation Community Detection (NSLPCD) algorithm, which detects the topic faster without compromising accuracy. The proposed algorithm analyzes the frequency distribution of keywords in the collection of tweets and finds two types of keywords: topic-identifying and topic-describing keywords, which play an important role in topic detection. Based on these defined keywords, the keyword co-occurrence graph is built, and subsequently, the NSLPCD algorithm is applied to get topic clusters in the form of communities. The experimental results using the real data of Twitter, show that the proposed method is effective in quality as well as run-time performance as compared to other existing methods.
Article PDF
Similar content being viewed by others
References
Sakaki, T., Okazaki, M., Matsuo, Y.: Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans. Knowl. Data Eng. 25(4), 919–931 (2013)
Li, R., Lei, K.H., Khadiwala, R., Chang, K.C.-C.: Tedas: A twitter-based event detection and analysis system. In: Data engineering (icde), 2012 ieee 28th international conference on, IEEE, pp 1273–1276 (2012)
Sayyadi, H., Raschid, L.: A graph analytical approach for topic detection. ACM Transactions on Internet Technology (TOIT) 13(2), 4 (2013)
Newman, M.E.J.: Analysis of weighted networks. Physical review E 70(5), 056131 (2004)
Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Physical review E 76(3), 036106 (2007)
Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: Real-world event identification on twitter. ICWSM 11(2011), 438–441 (2011)
Sankaranarayanan, J., Samet, H., Teitler, B.E., Lieberman, M.D., Sperling, J.: Twitterstand: news in tweets. In: Proceedings of the 17th acm sigspatial international conference on advances in geographic information systems, ACM, pp 42–51 (2009)
Kim, H.-G., Lee, S., Kyeong, S.: Discovering hot topics using twitter streaming data social topic detection and geographic clustering. In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on, IEEE, pp 1215–1220 (2013)
Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, ACM, pp 1155–1158 (2010)
O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: Exploratory search and topic summarization for twitter. In: ICWSM, pp 384–385 (2010)
Papadopoulos, S., Kompatsiaris, Y., Vakali, A.: A graph-based clustering scheme for identifying related tags in folksonomies. In: International Conference on Data Warehousing and Knowledge Discovery, Springer, pp 65–76 (2010)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of machine Learning research 3, 993–1022 (2003)
Diao, Q., Jiang, J., Zhu, F., Lim, E.-P.: Finding bursty topics from microblogs. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, Association for Computational Linguistics, pp 536–544 (2012)
Petrović, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to twitter. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp 181–189 (2010)
Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., Ounis, I.: Bieber no more: First story detection using twitter and wikipedia. In: SIGIR 2012 Workshop on Time-aware Information Access (2012)
Petrović, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and twitter. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, pp 338–346 (2012)
Feng, X., Zhang, S., Liang, W., Liu, J.: Efficient location-based event detection in social text streams. In: International Conference on Intelligent Science and Big Data Engineering, Springer, pp 213–222 (2015)
Hasan, M., Orgun, M.A., Schwitter, R.: Twitternews: real time event detection from the twitter data stream. PeerJ PrePrints 4, e2297v1 (2016)
Alsaedi, N., Burnap, P., Rana, O.: Can we predict a riot? disruptive event detection using twitter. ACM Transactions on Internet Technology (TOIT) 17(2), 18 (2017)
Li, C., Sun, A., Datta, A.: Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, pp 155–164 (2012)
Ifrim, G., Shi, B., Brigadir, I.: Event detection in twitter using aggressive filtering and hierarchical tweet clustering. In: Second Workshop on Social News on the Web (SNOW), Seoul, Korea, 8 April 2014, ACM (2014)
Zhao, S., Gao, Y., Ding, G., Chua, T.-S.: Real-time multimedia social event detection in microblog. IEEE Transactions on Cybernetics, 3218–3231 (2017)
Zhang, C., Lei, D., Yuan, Q., Zhuang, H., Kaplan, L., Wang, S., Han, J.: Geoburst+: Effective and real-time local event detection in geo-tagged tweet streams. ACM Transactions on Intelligent Systems and Technology (TIST) 9(3), 34 (2018)
Hossny, A.H., Mitchell, L.: Event detection in twitter: A keyword volume approach. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 1200–1208 (2018)
Choi, H-J, Park, C.H.: Emerging topic detection in twitter stream based on high utility pattern mining. Expert Syst. Appl. 115, 27–36 (2019)
Mehrotra, R., Sanner, S., Buntine, W., Xie, L.: Improving lda topic models for microblogs via tweet pooling and automatic labeling. In: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 889–892 (2013)
Zhou, X., Chen, L.: Event detection over twitter social media streams. The VLDB journal 23(3), 381–400 (2014)
Jin, D., Liu, D.-Y., Yang, B., Liu, J., He, D.-X., Tian, Y.: Fast complex network clustering algorithm using local detection. Dianzi Xuebao(Acta Electronica Sinica) 39(11), 2540–2546 (2011)
Cruz, J.D., Bothorel, C., Poulet, F.: Community detection and visualization in social networks: Integrating structural and semantic information. ACM Transactions on Intelligent Systems and Technology (TIST) 5(1), 11 (2013)
Nguyen, T., Phung, D., Adams, B., Tran, T., Venkatesh, S.: Hyper-community detection in the blogosphere. In: Proceedings of second ACM SIGMM workshop on Social media, ACM, pp 21–26 (2010)
Pathak, N., DeLong, C., Banerjee, A., Erickson, K.: Social topic models for community extraction. In: The 2nd SNA-KDD workshop, 8, p 2008 (2008)
Hashimoto, T., OKAMOTO, Tetsuji KUBOYAMAbHiroshi, SHIN, K.: Topic extraction from millions of tweets based on community detection in bipartite networks. Information Modelling and Knowledge Bases XXIX 301, 395 (2018)
Girvan, M., Newman, MarkEJ: Community structure in social and biological networks. Proceedings of the national academy of sciences 99(12), 7821–7826 (2002)
Newman, MarkEJ, Girvan, M.: Finding and evaluating community structure in networks. Physical review E 69(2), 026113 (2004)
Newman, MarkEJ: Fast algorithm for detecting community structure in networks. Physical review E 69(6), 066133 (2004)
Clauset, A., Newman, MarkEJ, Moore, C.: Finding community structure in very large networks. Physical review E 70(6), 066111 (2004)
Blondel, V.D., Guillaume, J-L, Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment 2008(10), P10008 (2008)
Waltman, L., VanEck, N.J.: A smart local moving algorithm for large-scale modularity-based community detection. The European Physical Journal B 86(11), 471 (2013)
Palla, G., Derényi, I, Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814 (2005)
Kumpula, J.M., Kivelä, M., Kaski, K., Saramäki, J.: Sequential algorithm for fast clique percolation. Phys. Rev. E. 78(2), 026109 (2008)
Lee, C., Reid, F., McDaid, A., Hurley, N.: Detecting highly overlapping community structure by greedy clique expansion. arXiv preprint arXiv:1002.1827 (2010)
Gregory, S.: Finding overlapping communities using disjoint community detection algorithms. In: Complex networks, Springer, pp 47–61 (2009)
Xie, J., Szymanski, B.K.: Towards linear time overlapping community detection in social networks. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, pp 25–36 (2012)
Xing, Y., Meng, F., Zhou, Y., Zhu, M., Shi, M., Sun, G.: A node influence based label propagation algorithm for community detection in networks. Sci. World J. 2014, 1–13 (2014)
Liu, W., Jiang, X., Pellegrini, M., Wang, X.: Discovering communities in complex networks by edge label propagation. Scientific reports 6, 22470 (2016)
Gui, Q., Deng, R., Xue, P., Cheng, X.: A community discovery algorithm based on boundary nodes and label propagation. Pattern Recogn. Lett. 109, 103–109 (2018)
Cheng, X., Yan, X., Lan, Y., Guo, J.: Btm: Topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 16–22 (1999)
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66(336), 846–850 (1971)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, J., Singh, A.K. NSLPCD: Topic based tweets clustering using Node significance based label propagation community detection algorithm. Ann Math Artif Intell 89, 371–407 (2021). https://doi.org/10.1007/s10472-020-09709-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-020-09709-z