Skip to main content
Log in

Tracking the history and evolution of entities: entity-centric temporal analysis of large social media archives

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

How did the popularity of the Greek Prime Minister evolve in 2015? How did the predominant sentiment about him vary during that period? Were there any controversial sub-periods? What other entities were related to him during these periods? To answer these questions, one needs to analyze archived documents and data about the query entities, such as old news articles or social media archives. In particular, user-generated content posted in social networks, like Twitter and Facebook, can be seen as a comprehensive documentation of our society, and thus, meaningful analysis methods over such archived data are of immense value for sociologists, historians, and other interested parties who want to study the history and evolution of entities and events. To this end, in this paper we propose an entity-centric approach to analyze social media archives and we define measures that allow studying how entities were reflected in social media in different time periods and under different aspects, like popularity, attitude, controversiality, and connectedness with other entities. A case study using a large Twitter archive of 4 years illustrates the insights that can be gained by such an entity-centric and multi-aspect analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. http://www.internetlivestats.com/twitter-statistics/ (August 30, 2018).

  2. http://spark.apache.org/.

  3. https://github.com/iosifidisvasileios/Large-Scale-Entity-Analysis.

  4. http://microposts2016.seas.upenn.edu/.

  5. http://alt.qcri.org/semeval2017/task4/.

  6. https://l3s.de/~iosifidis/TSentiment15/.

  7. http://l3s.de/~iosifidis/tpdl2017/.

  8. https://help.twitter.com/en/rules-and-policies/copyright-policy.

  9. https://www.nytimes.com/2016/03/31/us/politics/donald-trump-abortion.html (August 30, 2018).

References

  1. Amigó, E., Carrillo de Albornoz, J., Chugur, I., Corujo, A., Gonzalo, J., Meij, E., de Rijke, M., Spina, D.: Overview of replab 2014: Author profiling and reputation dimensions for online reputation management. In: CLEF (2014)

  2. Ardon, S., Bagchi, A., Mahanti, A., Ruhela, A., Seth, A., Tripathy, RM., Triukose, S.: Spatio-temporal analysis of topic popularity in Twitter. arXiv preprint arXiv:1111.2904 (2011)

  3. Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools and platforms. AI & SOCIETY 30(1), 89–116 (2015)

    Article  Google Scholar 

  4. Blanco, R., Ottaviano, G., Meij, E.: Fast and space-efficient entity linking for queries. In: WSDM (2015)

  5. Bruns, A., Stieglitz, S.: Towards more systematic Twitter analysis: metrics for tweeting activities. Int. J. Soc. Res. Methodol. 16(2), 91–108 (2013)

    Article  Google Scholar 

  6. Bruns, A., Weller, K.: Twitter as a first draft of the present: and the challenges of preserving it for the future. In: WebSci (2016)

  7. Celik, I., Abel, F., Houben, G.J.: Learning semantic relationships between entities in Twitter. In: ICWE (2011)

  8. Chandrasekaran, B., Josephson, J.R., Benjamins, V.R.: What are ontologies, and why do we need them? IEEE Intell. Syst. Appl. 14(1), 20–26 (1999)

    Article  Google Scholar 

  9. Chang, Y., Wang, X., Mei, Q., Liu, Y.: Towards twitter context summarization with user influence models. In: WSDM (2013)

  10. Chang, Y., Tang, J., Yin, D., Yamada, M., Liu, Y.: Timeline summarization from social media with life cycle models. In: IJCAI (2016)

  11. Chen, P.P.S.: The entity-relationship model toward a unified view of data. ACM Trans. Database Syst. (TODS) 1(1), 9–36 (1976)

    Article  MathSciNet  Google Scholar 

  12. Fafalios, P., Holzmann, H., Kasturia, V., Nejdl, W.: Building and querying semantic layers for web archives. In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp 1–10. IEEE (2017a)

  13. Fafalios, P., Iosifidis, V., Stefanidis, K., Ntoutsi, E.: Multi-aspect entity-centric analysis of big social media archives. In: International Conference on Theory and Practice of Digital Libraries, pp 261–273. Springer (2017b)

  14. Fafalios, P., Holzmann, H., Kasturia, V., Nejdl, W.: Building and querying semantic layers for web archives (extended version). Int. J. Digit. Libr. (2018a) https://doi.org/10.1007/s00799-018-0251-0

  15. Fafalios, P., Iosifidis, V., Ntoutsi, E., Dietze, S.: Tweetskb: A public and large-scale rdf corpus of annotated tweets. In: European Semantic Web Conference, pp. 177–190. Springer (2018b)

  16. Farzindar, A., Khreich, W.: A survey of techniques for event detection in twitter. Comput. Intell. 31(1), 132–164 (2015)

    Article  MathSciNet  Google Scholar 

  17. Garimella, K., Morales, G.D.F., Gionis, A., Mathioudakis, M.: Quantifying controversy on social media. ACM Trans. Soc. Comput. 1(1), 3 (2018)

    Article  Google Scholar 

  18. Guille, A., Hacid, H., Favre, C., Zighed, D.A.: Information diffusion in online social networks: a survey. SIGMOD Rec. 42(2), 17–28 (2013)

    Article  Google Scholar 

  19. Heath, T., Bizer, C.: Linked data: evolving the web into a global data space. Synth. Lect. Semant. Web Theory Technol. 1(1), 1–136 (2011)

    Article  Google Scholar 

  20. Iosifidis, V., Ntoutsi, E.: Large scale sentiment learning with limited labels. In: KDD (2017)

  21. Kucuktunc, O., Cambazoglu, B.B., Weber, I., Ferhatosmanoglu, H.: A large-scale sentiment analysis for Yahoo! answers. In: WSDM (2012)

  22. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., Van Kleef, P., Auer, S.: Dbpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6(2), 167–195 (2015)

    Article  Google Scholar 

  23. Li, J., Cardie, C.: Timeline generation: Tracking Individuals on Twitter. In: WWW (2014)

  24. Meng, X., Wei, F., Liu, X., Zhou, M., Li, S., Wang, H.: Entity-centric topic-oriented opinion summarization in Twitter. In: KDD (2012)

  25. Mohapatra, N., Iosifidis, V., Ekbal, A., Dietze, S., Fafalios, P.: Time-aware and corpus-specific entity relatedness. In: Workshop on Deep Learning for Knowledge Graphs and Semantic Technologies (DL4KGS)—In conjunction with ESWC 2018, Heraklion, Greece (2018)

  26. Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: Semeval-2016 task 4: Sentiment analysis in twitter. In: SemEval@ NAACL-HLT (2016)

  27. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2007)

    Article  Google Scholar 

  28. Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: Identifying misinformation in microblogs. In: EMNLP (2011)

  29. Ren, Z., Liang, S., Meij, E., de Rijke, M.: Personalized time-aware tweets summarization. In: SIGIR (2013)

  30. Rizzo, G., Basave, A.E.C., Pereira, B., Varga, A.: Making sense of microposts (#microposts2015) named entity recognition and linking (NEEL) challenge. CEUR-WS.org (2015)

  31. Rizzo, G., van Erp, M., Plu, J., Troncy, R.: Making sense of microposts (#microposts2016) named entity recognition and linking (NEEL) challenge. CEUR-WS.org (2016)

  32. Rosenthal, S., Farra, N., Nakov, P.: Semeval-2017 task 4: Sentiment analysis in twitter. In: SemEval (2017)

  33. Roussakis, Y., Chrysakis, I., Stefanidis, K., Flouris, G., Stavrakas, Y.: A Flexible Framework for Understanding the Dynamics of Evolving RDF Datasets. In: ISWC (2015)

  34. Saleiro, P., Soares, C.: Learning from the news: Predicting entity popularity on twitter. In: International Symposium on Intelligent Data Analysis, pp. 171–182. Springer (2016)

  35. Sebastiani, F.: An axiomatically derived measure for the evaluation of classification algorithms. In: ICTIR (2015)

  36. Sedhai, S., Sun, A.: Hspam14: A collection of 14 million tweets for hashtag-oriented spam research. In: SIGIR (2015)

  37. Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015)

    Article  Google Scholar 

  38. Stefanidis, K., Koloniari, G.: Enabling Social Search in Time through Graphs. In: Web-KR@CIKM (2014)

  39. Thelwall, M., Buckley, K., Paltoglou, G.: Sentiment strength detection for the social Web. J. Am. Soc. Inf. Sci. Technol. 63(1), 163–173 (2012)

    Article  Google Scholar 

  40. Tran, T.A., Niederée, C., Kanhabua, N., Gadiraju, U., Anand, A.: Balancing novelty and salience: Adaptive learning to rank entities for timeline summarization of high-impact events. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1201–1210. ACM (2015)

  41. Weikum, G., Spaniol, M., Ntarmos, N., Triantafillou, P., Benczúr, A., Kirkpatrick, S., Rigaux, P., Williamson, M.: Longitudinal Analytics on Web Archive Data: It’s About Time! In: CIDR (2011)

  42. Yao, J.g., Fan, F., Zhao, W.X., Wan, X., Chang, E., Xiao, J.: Tweet timeline generation with determinantal point processes. In: AAAI (2016)

  43. Yu, S., Kak, S.: A survey of prediction using social media (2012). arXiv preprint arXiv:1203.1647

  44. Zhang, L., Rettinger, A., Zhang, J.: A probabilistic model for time-aware entity recommendation. In: International Semantic Web Conference, pp. 598–614. Springer (2016)

  45. Zhao, X.W., Guo, Y., Yan, R., He, Y., Li, X.: Timeline generation with social attention. In: SIGIR (2013)

  46. Zimmer, M.: The Twitter Archive at the Library of Congress: challenges for information practice and information policy. First Monday (2015). https://doi.org/10.5210/fm.v20i7.5619

Download references

Acknowledgements

The work was partially funded by the European Commission for the ERC Advanced Grant ALEXANDRIA (No. 339233) and by the German Research Foundation (DFG) project OSCAR (Opinion Stream Classification with Ensembles and Active leaRners).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavlos Fafalios.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fafalios, P., Iosifidis, V., Stefanidis, K. et al. Tracking the history and evolution of entities: entity-centric temporal analysis of large social media archives. Int J Digit Libr 21, 5–17 (2020). https://doi.org/10.1007/s00799-018-0257-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-018-0257-7

Keywords

Navigation