Predicting the popularity of tweets using internal and external knowledge: an empirical Bayes type approach

Tan, Wai Hong; Chen, Feng

doi:10.1007/s10182-021-00390-z

Predicting the popularity of tweets using internal and external knowledge: an empirical Bayes type approach

Original Paper
Published: 26 February 2021

Volume 105, pages 335–352, (2021)
Cite this article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

297 Accesses
2 Citations
Explore all metrics

Abstract

The problem of tweet popularity prediction, or forecasting the total number of retweets stemming from an ancestral tweet, has attracted considerable interest recently. The prediction can be accomplished by fitting a point process model to the sequence of retweet times up to a certain censoring time and project the fitted model to a future time point. However, models employing such approach tend to have inferior prediction accuracy when the censoring time is too short before sufficient information can accumulate. To overcome this, we propose an empirical Bayes type approach of parameter estimation to combine internal knowledge on the times of historical retweets up to the censoring time and external knowledge on complete retweet sequences in the training data. We demonstrate the approach using several point process models with finite-dimensional parameters, where the prior distribution for the parameter of each model is constructed based on the external knowledge, and the likelihood is calculated based on the internal knowledge. The mode of the posterior distribution is used as the estimator of the finite-dimensional parameter, and the mean of the predictive distribution for the number of retweets implied by each of the estimated models is used to predict the tweet popularity. Using a large Twitter data set, we reveal that the proposed methodology not only enables prediction at time zero before the arrival of any retweet event, but also substantially improves the prediction performances of existing models, especially at earlier censoring times.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Modeling the popularity of twitter hashtags with master equations

Article 02 February 2022

Oscar Fontanelli, Demian Hernández & Ricardo Mansilla

Prediction of User Retweets Based on Social Neighborhood Information and Topic Modelling

Evaluating Important Factors and Effective Models for Twitter Trend Prediction

Data Availability Statement

Data are available from http://snap.stanford.edu/seismic/

Notes

http://snap.stanford.edu/seismic/.

References

Bandari, R., Asur, S., Huberman, B.: The pulse of news in social media: Forecasting popularity. In: ICWSM 2012 - Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (2012)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Chen, F., Tan, W.H.: Marked self-exciting point process modelling of information diffusion on Twitter. Ann. Appl. Stat. 12(4), 2175–2196 (2018)
Article MathSciNet Google Scholar
Cleveland, W.S., Devlin, S.J.: Locally weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83(403), 596–610 (1988)
Article Google Scholar
Cowling, A., Hall, P.: On pseudodata methods for removing boundary effects in kernel density estimation. J. R. Stat. Soc.: Ser. B (Methodol.) 58(3), 551–563 (1996)
MathSciNet MATH Google Scholar
Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes Volume I: Elementary Theory and Methods, 2nd edn. Springer, New York (2003)
Eysenbach, G.: Can tweets predict citations? metrics of social impact based on Twitter and correlation with traditional metrics of scientific impact. J. Med. Internet Res. 13(4), (2011)
Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21(2), 215–223 (1979)
Article MathSciNet Google Scholar
Hong, L., Dan, O., Davison, BD.: Predicting popular messages in Twitter. In: Proceedings of the 20th international conference companion on World wide web, ACM, pp. 57–58 (2011)
Kant, G., Weisser, C., Säfken, B.: TTLocVis: A Twitter topic location visualization package. J. Open Sour. Software 5(25), (2020)
Kobayashi, R., Lambiotte, R.: TiDeH: time-dependent Hawkes process for predicting retweet dynamics. In: Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM 2016), pp. 191–200 (2016)
Ma, Z., Sun, A., Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter. J. Am. Soc. Inform. Sci. Technol. 64(7), 1399–1410 (2013)
Article Google Scholar
Malmgren, R.D., Stouffer, D.B., Motter, A.E., Amaral, L.A.: A Poissonian explanation for heavy tails in e-mail communication. Proc. Nat. Acad. Sci. 105(47), 18153–18158 (2008)
Article Google Scholar
Mishra, S., Rizoiu, MA., Xie, L.: Feature driven and point process approaches for popularity prediction. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, ACM, pp. 1069–1078 (2016)
R Core Team.: R: A language and environment for statistical computing (2019)
Silverman, B.: Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Taylor & Francis (1986)
Van Aelst, P., van Erkel, P., D’heer, E., Harder, R.A.: Who is leading the campaign charts? Comparing individual popularity on old and new media. Inform. Commun. Soc. 20(5), 715–732 (2017)
Article Google Scholar
Xie, M., Singh, K.: Confidence distribution, the frequentist distribution estimator of a parameter: a review. Int. Stat. Rev. 81(1), 3–39 (2013)
Article MathSciNet Google Scholar
Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. ACM, pp. 177–186 (2011)
Yang, M., Chen, K., Miao, Z., Yang, X.: Cost-effective user monitoring for popularity prediction of online user-generated content. In: 2014 IEEE International Conference on Data Mining Workshop, pp. 944–951 (2014)
Zhao, Q., Erdogdu, M.A., He, H.Y., Rajaraman, A., Leskovec, J.: SEISMIC: a self-exciting point process model for predicting tweet popularity. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 1513–1522 (2015)

Download references

Acknowledgements

The authors gratefully acknowledge the constructive comments from the reviewers, which have led to improved presentation. This research includes computations using the computational cluster Katana supported by Research Technology Services at UNSW Sydney. The research also benefited from the assistance of resources from the National Computational Infrastructure (NCI), supported by the Australian Government.

Funding

Tan was supported by UMK Fundamental Research Grant [R/FUND/A0100/01348A/001/2020/00840] Chen was partly supported by UNSW Science Faculty Research Grant [PS35307]

Author information

Authors and Affiliations

Universiti Malaysia Kelantan, Kelantan, Malaysia
Wai Hong Tan
UNSW Sydney, Sydney, Australia
Feng Chen

Authors

Wai Hong Tan
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wai Hong Tan.

Ethics declarations

Conflicts of interest

Not applicable.

Code availability

Code is available from the authors upon request

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

See Fig. 5.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, W.H., Chen, F. Predicting the popularity of tweets using internal and external knowledge: an empirical Bayes type approach. AStA Adv Stat Anal 105, 335–352 (2021). https://doi.org/10.1007/s10182-021-00390-z

Download citation

Received: 16 November 2019
Accepted: 08 February 2021
Published: 26 February 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s10182-021-00390-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Predicting the popularity of tweets using internal and external knowledge: an empirical Bayes type approach

Abstract

Access this article

Similar content being viewed by others

Modeling the popularity of twitter hashtags with master equations

Prediction of User Retweets Based on Social Neighborhood Information and Topic Modelling

Evaluating Important Factors and Effective Models for Twitter Trend Prediction

Data Availability Statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Code availability

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting the popularity of tweets using internal and external knowledge: an empirical Bayes type approach

Abstract

Access this article

Similar content being viewed by others

Modeling the popularity of twitter hashtags with master equations

Prediction of User Retweets Based on Social Neighborhood Information and Topic Modelling

Evaluating Important Factors and Effective Models for Twitter Trend Prediction

Data Availability Statement

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Code availability

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation