Skip to main content
Log in

A computational literature review of football performance analysis through probabilistic topic modeling

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

This research aims to illustrate the potential use of concepts, techniques, and mining process tools to improve the systematic review process. Thus, a review was performed on two online databases (Scopus and ISI Web of Science) from 2012 to 2019. A total of 9649 studies were identified, which were analyzed using probabilistic topic modeling procedures within a machine learning approach. The Latent Dirichlet Allocation method, chosen for modeling, required the following stages: 1) data cleansing, and 2) data modeling into topics for coherence and perplexity analysis. All research was conducted according to the standards of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses in a fully computerized way. The computational literature review is an integral part of a broader literature review process. The results presented met three criteria: (1) literature review for a research area, (2) analysis and classification of journals, and (3) analysis and classification of academic and individual research teams. The contribution of the article is to demonstrate how the publication network is formed in this particular field of research, and how the content of abstracts can be automatically analyzed to provide a set of research topics for quick understanding and application in future projects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Adapted from Mortenson and Vidgen (2016).

Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Asmussen CB, Møller C (2019) Smart literature review: a practical topic modelling approach to exploratory literature review. J Big Data 6(1):93

    Article  Google Scholar 

  • Blei D (2012) Probabilistic topic models. Commun ACM 55(4):77–84. https://doi.org/10.1109/MSP.2010.938079

    Article  Google Scholar 

  • Blei DM, Edu BB, Ng AY, Edu AS, Jordan MI, Edu JB (2003) Latent Dirichlet Allocation. J Mach Learn Res 3:993–1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993

    Article  Google Scholar 

  • Blei DM, Lafferty JD (2007) A correlated topic model of science. Ann Appl Statist 1(1):17–35

    Article  MathSciNet  Google Scholar 

  • Bornmann L, Mutz R (2015) Growth rates of modern science : a bibliometric analysis based on the number of publications and cited references. J Am Soc Inf Sci 66(11):2215–2222. https://doi.org/10.1002/asi

    Article  Google Scholar 

  • Brown PE, Pietra VJ, Della Mercer, R. L., Pietra, S. a Della, & Lai, J. C. (1992) An Estimate of an Upper Bound for the Entropy of English. Comput Linguist 10598(1):31–40

    Google Scholar 

  • Canales CB, Sanz-Valero J (2020) Indicadores de impacto y prestigio de las revistas de ciencias de la salud indizadas en la Red SciELO: estudio comparativo. Rev Esp Salud Pública 94(9):12

    Google Scholar 

  • Carrington PJ, Scott J, Wasserman S (eds) (2005) Models and methods in social network analysis, vol 28. Cambridge University Press, Cambridg

    Google Scholar 

  • Chang J, Boyd-Graber J, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: How humans interpret topic models. Advances in neural information processing systems 22—proceedings of the 2009 conference, pp 288–296

  • Chen Z, Zhong F, Yuan X (2016) Framework of Integrated Big Data: A Review. IEEE Int Conf Big Data Anal ICBDA 2016:1–5. https://doi.org/10.1109/ICBDA.2016.7509815

    Article  Google Scholar 

  • Egghe L, Rousseau R (2020) The h-index formalism. Scientometrics, 1–9

  • Felizardo KR, Salleh N, Martins RM, Mendes E, MacDonell SG, Maldonado JC (2011) Using visual text mining to support the study selection activity in systematic literature reviews. In: 2011 international symposium on empirical software engineering and measurement, pp 77–86. https://doi.org/https://doi.org/10.1109/ESEM.2011.16

  • Figuerola CG, García Marco FJ, Pinto M (2017) Mapping the evolution of library and information science (1978–2014) using topic modeling on LISA. Scientometrics 112(3):1507–1535. https://doi.org/10.1007/s11192-017-2432-9

    Article  Google Scholar 

  • Fuglede B, Topsoe F (2004) Jensen–Shannon divergence and Hilbert space embedding. In: International symposium on information theory, 2004. ISIT 2004. Proceedings. (p. 31). IEEE

  • Ghali N, Panda M, Hassanien AE, Abraham A, Snasel V (2012) Social networks analysis: Tools, measures and visualization. In: Computational social networks, Springer, London, pp 3–23

  • Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(Supplement 1):5228–5235. https://doi.org/10.1073/pnas.0307752101

    Article  Google Scholar 

  • Griffiths TL, Steyvers M, Blei DM, Tenenbaum JB (2005) Integrating topics and syntax. Adv Neural Inf Process Syst 17:537–544.

    Google Scholar 

  • Grinäv AV (2020) The disadvantages of using scientometric indicators in the digital age. IOP Conference Series: Materials Science and Engineering (vol 940, No. 1, p. 012149). IOP Publishing.

  • Han X (2020) Evolution of research topics in LIS between 1996 and 2019: an analysis based on latent Dirichlet allocation topic model. Scientometrics, 1–35

  • Harzing AW (2007) Publish or Perish. https://harzing.com/resources/publish-or-perish

  • Hagberg A, Swart P, S Chult D (2008) Exploring network structure, dynamics, and function using NetworkX (No. LA-UR-08–05495; LA-UR-08–5495). Los Alamos National Lab.(LANL), Los Alamos, NM (United States)

  • Hirsch JE (2015) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102(46):16569–16572. https://doi.org/10.1073/pnas.2336195100

    Article  MATH  Google Scholar 

  • Horta V, Ströele V, Braga R, David JMN, Campos F (2018) Analyzing scientific context of researchers and communities by using complex network and semantic technologies. Futur Gener Comput Syst 89:584–605. https://doi.org/10.1016/j.future.2018.07.012

    Article  Google Scholar 

  • Hu C, Yang F, Zu X, Huang Z (2020) H Index weighted by Eigenfactor ff Citations for Journal Evaluation. Contemp Perspect Data Mining 4:103

    Google Scholar 

  • Jacsó P (2010) Comparison of journal impact rankings in the SCImago Journal & Country Rank and the Journal Citation Reports databases. Online Inf Rev 34(4):642–657. https://doi.org/10.1108/14684521011073034

    Article  Google Scholar 

  • Jahangirian M, Eldabi T, Garg L, Jun GT, Naseer A, Patel B et al (2011) A rapid review method for extremely large corpora of literature: Applications to the domains of modelling, simulation, and management. Int J Inf Manage 31(3):234–243. https://doi.org/10.1016/j.ijinfomgt.2010.07.004

    Article  Google Scholar 

  • Jennex ME (2015) Literature reviews and the review process: an editor-in-chief’s perspective. Commun Assoc Inf Syst 36:139–146

    Google Scholar 

  • La Rosa M, Fiannaca A, Rizzo R, Urso A (2015) Probabilistic topic modeling for the analysis and classification of genomic sequences. BMC Bioinf 16(Suppl 6):9. https://doi.org/10.1186/1471-2105-16-s6-s2

    Article  MATH  Google Scholar 

  • Lee H, Kwak J, Song M, Kim CO (2014) Coherence analysis of research and education using topic modeling. Scientometrics 102(2):1119–1137. https://doi.org/10.1021/acsnano.7b00569

    Article  Google Scholar 

  • Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JPA et al (2009) The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. PLoS Med. https://doi.org/10.1371/journal.pmed.1000100

    Article  Google Scholar 

  • Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165. https://doi.org/10.1147/rd.22.0159

    Article  MathSciNet  Google Scholar 

  • McLevey J, McIlroy-Young R (2017) Introducing metaknowledge: software for computational research in information science, network analysis, and science of science. J Informet 11(1):176–197. https://doi.org/10.1016/j.joi.2016.12.005

    Article  Google Scholar 

  • Mimno D, Wallach HM, Talley E, Leenders M, McCallum A (2011) Optimizing semantic coherence in topic models. In: EMNLP 2011—conference on empirical methods in natural language processing, proceedings of the conference, (2), 262–272.

  • Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 339:b2535. https://doi.org/10.1136/bmj.b2535

    Article  Google Scholar 

  • Mortenson MJ, Vidgen R (2016) A computational literature review of the technology acceptance model. Int J Inf Manage 36(6):1248–1259. https://doi.org/10.1016/j.ijinfomgt.2016.07.007

    Article  Google Scholar 

  • Muschelli, J. (2018). Gathering bibliometric information from the Scopus API using rscopus. R Journal.

  • Newman DJ, Block S (2006) Probabilistic topic decomposition of an eighteenth-century American newspaper. J Am Soc Inform Sci Technol 57(6):753–767

    Article  Google Scholar 

  • Ngai EWT, Xiu L, Chau DCK (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36:2592–2602. https://doi.org/10.1016/j.eswa.2008.02.021

    Article  Google Scholar 

  • Pan R, Fortunato S (2014) Author Impact Factor: tracking the dynamics of individual scientific impact. Sci Rep 4:4880. https://doi.org/10.1038/srep04880

    Article  Google Scholar 

  • Pham B, Bagheri E, Rios P, Pourmasoumi A, Robson RC, Hwee J et al (2018) Improving the conduct of systematic reviews: a process mining perspective. J Clin Epidemiol 103:101–111. https://doi.org/10.1016/j.jclinepi.2018.06.011

    Article  Google Scholar 

  • Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50. https://doi.org/https://doi.org/10.13140/2.1.2393.1847

  • Röder M, Both A, Hinneburg A (2015) Exploring the space of topic coherence measures. In: Proceedings of the eighth ACM international conference on web search and data mining—WSDM ’15, 399–408. https://doi.org/https://doi.org/10.1145/2684822.2685324

  • Rose ME, Kitchin JR (2019) pybliometrics: Scriptable bibliometrics using a Python interface to Scopus. SoftwareX 10:100263. https://doi.org/10.1016/j.softx.2019.100263

    Article  Google Scholar 

  • Rowley J, Slack F (2004) Conducting a literature review. Manag Res News 27(6):31–39. https://doi.org/10.1108/01409170410784185

    Article  Google Scholar 

  • Shimada D, Kotani R, Iyatomi H (2016) Document classification through image-based character embedding and wildcard training. Proceedings—2016 IEEE international conference on Big Data, Big Data 2016, 3922–3927. https://doi.org/https://doi.org/10.1109/BigData.2016.7841067

  • Sievert C, Shirley K (2014) LDAvis: A method for visualizing and interpreting topics. Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70.

  • Simsek A, Kara R (2018) Using swarm intelligence algorithms to detect influential individuals for influence maximization in social networks. Expert Syst Appl 114:224–236. https://doi.org/10.1016/j.eswa.2018.07.038

    Article  Google Scholar 

  • Syed S, Spruit M (2017) Full-Text or abstract? Examining topic coherence scores using latent dirichlet allocation. In: Proceedings—2017 international conference on data science and advanced analytics, DSAA 2017, 2018-January, 165–174. https://doi.org/https://doi.org/10.1109/DSAA.2017.61

  • Tabassum S, Pereira FS, Fernandes S, Gama J (2018) Social network analysis: an overview. Wiley Interdiscip Rev Data Mining Knowl Discov 8(5):e1256

    Google Scholar 

  • Tranfield D, Denyer D, Smart P (2003) Towards a Methodology for Developing Evidence-Informed Management Knowledge by Means of Systematic Review 14:207–222. https://doi.org/10.1111/1467-8551.00375

    Article  Google Scholar 

  • Vairavan M, Prayle A, Davies P (2020) You are what you read: bias, journal prestige and manipulation. Archives of Disease in Childhood-Education and Practice

  • van Altena AJ, Moerland PD, Zwinderman AH, Olabarriaga SD (2016) Understanding big data themes from scientific biomedical literature through topic modeling. J Big Data. https://doi.org/10.1186/s40537-016-0057-0

    Article  Google Scholar 

  • Watts DJ (2004) Six degrees: the science of a connected age. WW Norton & Company, New York

    Google Scholar 

  • Yau CK, Porter A, Newman N, Suominen A (2014) Clustering scientific documents with topic modeling. Scientometrics 100(3):767–786. https://doi.org/10.1007/s11192-014-1321-8

    Article  Google Scholar 

Download references

Acknowledgement

This work was supported by the Carlos Chagas Filho Foundation for Research Support of the State of Rio de Janeiro [grant number E-26/202.638/2018].

Funding

This work was supported by the Carlos Chagas Filho Foundation for Research Support of the State of Rio de Janeiro [grant number E-26/202.638/2018].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vitor Ayres Principe.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Principe, V.A., de Souza Vale, R.G., de Castro, J.B.P. et al. A computational literature review of football performance analysis through probabilistic topic modeling. Artif Intell Rev 55, 1351–1371 (2022). https://doi.org/10.1007/s10462-021-09998-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-09998-8

Keywords

Navigation