Skip to main content
Log in

The Trade Secret Taboo: Open Science Methods are Required to Improve Prediction Models in Sports Medicine and Performance

  • Current Opinion
  • Published:
Sports Medicine Aims and scope Submit manuscript

Abstract

Clinical prediction models in sports medicine that utilize regression or machine learning techniques have become more widely published, used, and disseminated. However, these models are typically characterized by poor methodology and incomplete reporting, and an inadequate evaluation of performance, leading to unreliable predictions and weak clinical utility within their intended sport population. Before implementation in practice, models require a thorough evaluation. Strong replicable methods and transparency reporting allow practitioners and researchers to make independent judgments as to the model’s validity, performance, clinical usefulness, and confidence it will do no harm. However, this is not reflected in the sports medicine literature. As shown in a recent systematic review of models for predicting sports injury models, most were typically characterized by poor methodology, incomplete reporting, and inadequate performance evaluation. Because of constraints imposed by data from individual teams, the development of accurate, reliable, and useful models is highly reliant on external validation. However, a barrier to collaboration is a desire to maintain a competitive advantage; a team’s proprietary information is often perceived as high value, and so these ‘trade secrets’ are frequently guarded. These ‘trade secrets’ also apply to commercially available models, as developers are unwilling to share proprietary (and potentially profitable) development and validation information. In this Current Opinion, we: (1) argue that open science is essential for improving sport prediction models and (2) critically examine sport prediction models for open science practices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bullock GS, Mylott J, Hughes T, Nicholson KF, Riley RD, Collins GS. Just how confident can we be in predicting sports injuries? A systematic review of the methodological conduct and performance of existing musculoskeletal injury prediction models in sport. Sport Med. 2022;42(1):2469–82.

    Article  Google Scholar 

  2. Bullock GS, Hughes T, Sergeant JC, Callaghan MJ, Riley RD, Collins GS. Clinical prediction models in sports medicine: a guide for clinicians and researchers. J Orthop Sport Phys Ther. 2021;51(10):517–25.

    Article  Google Scholar 

  3. Bullock GS, Hughes T, Arundale AH, Ward P, Collins GS, Kluzek S. Black box prediction methods in sports medicine deserve a red card for reckless practice: a change of tactics is needed to advance athlete care. Sport Med. 2022;52(8):2799–801.

    Article  Google Scholar 

  4. Bullock GS, Hughes T, Sergeant JC, Callaghan MJ, Riley R, Collins G. Methods matter: clinical prediction models will benefit sports medicine practice, but only if they are properly developed and validated. Br J Sport Med. 2021;22:1319–21.

    Article  Google Scholar 

  5. Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. Calibration drift in regression and machine learning models for acute kidney injury. J Am Med Inform Assoc. 2017;24(6):1052–61.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. J Br Surg. 2015;102(3):148–58.

    Article  CAS  Google Scholar 

  7. Hughes T, Riley RD, Callaghan MJ, Sergeant JC. The value of Preseason screening for injury prediction: the development and internal validation of a multivariable prognostic model to predict indirect muscle injury risk in elite football (soccer) players. Sport Med Open. 2020;6(1):1–13.

    Article  Google Scholar 

  8. Ford RA. Trade secrets and information security in the age of sports analytics. Oxford: The Oxford Handbook of American Sports Law; 2018.

  9. Bahr R, Holme I. Risk factors for sports injuries: a methodological approach. Br J Sport Med. 2003;37(5):384–92.

    Article  CAS  Google Scholar 

  10. Impellizzeri FM, Ward P, Coutts AJ, Bornn L, McCall A. Training load and injury part 2: questionable research practices hijack the truth and mislead well-intentioned clinicians. J Orthop Sport Phys Ther. 2020;50(10):577–84.

    Article  Google Scholar 

  11. Caldwell AR, Vigotsky AD, Tenan MS, Radel R, Mellor DT, Kreutzer A, et al. Moving sport and exercise science forward: a call for the adoption of more transparent research practices. Sport Med. 2020;50(3):449–59.

    Article  Google Scholar 

  12. Andrade C. HARKing, cherry-picking, p-hacking, fishing expeditions, and data dredging and mining as questionable research practices. J Clin Psychiatry. 2021;82(1):25941.

    Article  Google Scholar 

  13. John LK, Loewenstein G, Prelec D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psych Sci. 2012;23(5):524–32.

    Article  Google Scholar 

  14. Bullock GS, Ward P, Peters S, Arundale AJH, Murray A, Impellizzeri FM, et al. Call for open science in sports medicine. Br J Sport Med. 2022;56:105719.

    Article  Google Scholar 

  15. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3(1):1–9.

    Article  Google Scholar 

  16. Rockhold F, Nisen P, Freeman A. Data sharing at a crossroads. N Engl J Med. 2016;375(12):1115–7.

    Article  PubMed  Google Scholar 

  17. Bertagnolli MM, Sartor O, Chabner BA, Rothenberg ML, Khozin S, Hugh-Jones C, et al. Advantages of a truly open-access data-sharing model. N Eng J Med. 2017;376(12):1178–81.

    Article  Google Scholar 

  18. Kadakia KT, Beckman AL, Ross JS, Krumholz HM. Leveraging open science to accelerate research. N Engl J Med. 2021;384(17): e61.

    Article  PubMed  Google Scholar 

  19. Janssen K, Moons K, Kalkman C, Grobbee D, Vergouwe Y. Updating methods improved the performance of a clinical prediction model in new patients. J Clin Epidemiol. 2008;61(1):76–86.

    Article  CAS  PubMed  Google Scholar 

  20. Riley RD, Ensor J, Snell KI, Debray TP, Altman DG, Moons KG, et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ. 2016;353: i3140.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Riley RD, Lambert PC, Abo-Zaid G. Meta-analysis of individual participant data: rationale, conduct, and reporting. BMJ. 2010;340: c221.

    Article  PubMed  Google Scholar 

  22. Bleeker S, Moll H, Steyerberg EA, Donders A, Derksen-Lubsen G, Grobbee D, et al. External validation is necessary in prediction research: a clinical example. J Clin Epidemiol. 2003;56(9):826–32.

    Article  CAS  PubMed  Google Scholar 

  23. Tedersoo L, Küngas R, Oras E, Köster K, Eenmaa H, Leijen Ä, et al. Data sharing practices and data availability upon request differ across scientific disciplines. Sci Data. 2021;8(1):1–11.

    Article  Google Scholar 

  24. Krawczyk M, Reuben E. (Un) available upon request: field experiment on researchers’ willingness to share supplementary materials. Acc Res. 2012;19(3):175–86.

    Article  Google Scholar 

  25. Abalo-Núñez R, Gutiérrez-Sánchez A, Pérez MI, Vernetta-Santana M. Injury prediction in aerobic gymnastics based on anthropometric variables. Sci Sport. 2018;33(4):228–36.

    Article  Google Scholar 

  26. Ayala F, López-Valenciano A, Martín JAG, Croix MDS, Vera-Garcia FJ, del Pilar G-V, et al. A preventive model for hamstring injuries in professional soccer: learning algorithms. Int J Sport Med. 2019;40(05):344–53.

    Article  Google Scholar 

  27. Carbuhn AF, Sanchez Z, Fry AC, Reynolds MR, Magee LM. A simplified prediction model for lower extremity long bone stress injuries in male endurance running athletes. Clin J Sport Med. 2020;30(5):e124–6.

    PubMed  Google Scholar 

  28. Carey DL, Crossley KM, Whiteley R, Mosler A, Ong K-L, Crow J, et al. Modeling training loads and injuries: the dangers of discretization. Med Sci Sport Exerc. 2018;50(11):2267–76.

    Article  Google Scholar 

  29. Colby MJ, Dawson B, Peeling P, Heasman J, Rogalski B, Drew MK, et al. Improvement of prediction of noncontact injury in elite Australian footballers with repeated exposure to established high-risk workload scenarios. Int J Sport Physiol Perform. 2018;13(9):1130–5.

    Article  Google Scholar 

  30. Feijen S, Struyf T, Kuppens K, Tate A, Struyf F. Prediction of shoulder pain in youth competitive swimmers: the development and internal validation of a prognostic prediction model. Am J Sport Med. 2021;49(1):154–61.

    Article  Google Scholar 

  31. Gabbett TJ. The development and application of an injury prediction model for noncontact, soft-tissue injuries in elite collision sport athletes. J Strength Cond Res. 2010;24(10):2593–603.

    Article  PubMed  Google Scholar 

  32. Ivarsson A, Johnson U, Lindwall M, Gustafsson H, Altemyr M. Psychosocial stress as a predictor of injury in elite junior soccer: a latent growth curve analysis. J Sci Med Sport. 2014;17(4):366–70.

    Article  PubMed  Google Scholar 

  33. Jauhiainen S, Kauppi J-P, Leppänen M, Pasanen K, Parkkari J, Vasankari T, et al. New machine learning approach for detection of injury risk factors in young team sport athletes. Int J Sport Med. 2021;42(02):175–82.

    Article  Google Scholar 

  34. Karuc J, Mišigoj-Durakovic M, Šarlija M, Markovic G, Hadžic V, Trošt-Bobic T, et al. Can injuries be predicted by functional movement screen in adolescents? The application of machine learning. J Strength Cond Res. 2021;35(4):910–9.

    Article  PubMed  Google Scholar 

  35. Khayambashi K, Ghoddosi N, Straub RK, Powers CM. Hip muscle strength predicts noncontact anterior cruciate ligament injury in male and female athletes: a prospective study. Am J Sport Med. 2016;44(2):355–61.

    Article  Google Scholar 

  36. Landis SE, Baker RT, Seegmiller JG. Non-contact anterior cruciate ligament and lower extremity injury risk prediction using functional movement screen and knee abduction moment: an epidemiological observation of female intercollegiate athletes. Int J Sport Phys Ther. 2018;13(6):973.

    Article  Google Scholar 

  37. López-Valenciano A, Ayala F, Puerta JM, Croix MDS, Vera-García F, Hernández-Sánchez S, et al. A preventive model for muscle injuries: a novel approach based on learning algorithms. Med Sci Sport Exerc. 2018;50(5):915.

    Article  Google Scholar 

  38. Luu BC, Wright AL, Haeberle HS, Karnuta JM, Schickendantz MS, Makhni EC, et al. Machine learning outperforms logistic regression analysis to predict next-season NHL player injury: an analysis of 2322 players from 2007 to 2017. Orthop J Sport Med. 2020;8(9):2325967120953404.

    Google Scholar 

  39. McCann RS, Kosik KB, Terada M, Beard MQ, Buskirk GE, Gribble PA. Acute lateral ankle sprain prediction in collegiate women’s soccer players. Int J Sport Phys Ther. 2018;13(1):12.

    Article  Google Scholar 

  40. Oliver JL, Ayala F, Croix MBDS, Lloyd RS, Myer GD, Read PJ. Using machine learning to improve our understanding of injury risk and prediction in elite male youth football players. J Sci Med Sport. 2020;23(11):1044–8.

    Article  PubMed  Google Scholar 

  41. Pontillo M, Hines SM, Sennett BJ. Prediction of ACL injuries from vertical jump kinetics in Division 1 collegiate athletes. Int J Sport Phys Ther. 2021;16(1):156.

    Article  Google Scholar 

  42. Powers CM, Ghoddosi N, Straub RK, Khayambashi K. Hip strength as a predictor of ankle sprains in male soccer players: a prospective study. J Athl Train. 2017;52(11):1048–55.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Rommers N, Rössler R, Verhagen E, Vandecasteele F, Verstockt S, Vaeyens R, et al. A machine learning approach to assess injury risk in elite youth football players. Med Sci Sport Exerc. 2020;52(8):1745–51.

    Article  Google Scholar 

  44. Rossi A, Pappalardo L, Cintia P, Iaia FM, Fernández J, Medina D. Effective injury forecasting in soccer with GPS training data and machine learning. PLoS One. 2018;13(7): e0201264.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Ruddy J, Shield A, Maniar N, Williams M, Duhig S, Timmins R, et al. Predictive modeling of hamstring strain injuries in elite Australian footballers. Med Sci Sport Exerc. 2018;50(5):906–14.

    Article  Google Scholar 

  46. Shambaugh JP, Klein A, Herbert JH. Structural measures as predictors of injury basketball players. Med Sci Sport Exerc. 1991;23(5):522–7.

    Article  CAS  Google Scholar 

  47. Sturnick DR, Vacek PM, DeSarno MJ, Gardner-Morse MG, Tourville TW, Slauterbeck JR, et al. Combined anatomic factors predicting risk of anterior cruciate ligament injury for males and females. Am J Sport Med. 2015;43(4):839–47.

    Article  Google Scholar 

  48. Teramoto M, Cross CL, Cushman DM, Maak TG, Petron DJ, Willick SE. Game injuries in relation to game schedules in the National Basketball Association. J Sci Med Sport. 2017;20(3):230–5.

    Article  PubMed  Google Scholar 

  49. Thornton HR, Delaney JA, Duthie GM, Dascombe BJ. Importance of various training-load measures in injury incidence of professional rugby league athletes. Int J Sport Physiol Perform. 2017;12(6):819–24.

    Article  Google Scholar 

  50. Van Der Does H, Brink M, Benjaminse A, Visscher C, Lemmink K. Jump landing characteristics predict lower extremity injuries in indoor team sports. Int J Sport Med. 2016;37(03):251–6.

    Google Scholar 

  51. Whiteside D, Martini DN, Lepley AS, Zernicke RF, Goulet GC. Predictors of ulnar collateral ligament reconstruction in Major League Baseball pitchers. Am J Sport Med. 2016;44(9):2202–9.

    Article  Google Scholar 

  52. Wiese BW, Boone JK, Mattacola CG, McKeon PO, Uhl TL. Determination of the functional movement screen to predict musculoskeletal injury in intercollegiate athletics. Athl Train Sport Health Care. 2014;6(4):161–9.

    Article  Google Scholar 

  53. Wilkerson GB, Colston MA. A refined prediction model for core and lower extremity sprains and strains among collegiate football players. J AThl Train. 2015;50(6):643–50.

    Article  PubMed  PubMed Central  Google Scholar 

  54. Pollack KM, D’Angelo J, Green G, Conte S, Fealy S, Marinak C, et al. Developing and implementing Major League Baseball’s health and injury tracking system. Am J Epidemiol. 2016;183(5):490–6.

    Article  PubMed  Google Scholar 

  55. Snoke J, Raab GM, Nowok B, Dibben C, Slavkovic A. General and specific utility measures for synthetic data. J R Stat Soc A. 2018;181(3):663–88.

    Article  Google Scholar 

  56. Abowd JM, Vilhuber L. How protective are synthetic data? International Conference on Privacy in Statistical Databases; 2008. p. 239–46.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Garrett S. Bullock.

Ethics declarations

Funding

Gary S. Collins was supported by the NIHR Biomedical Research Centre, Oxford, and Cancer Research UK (programme Grant: C49297/A27294).

Conflict of interest

Garrett S. Bullock, Patrick Ward, Franco M. Impellizzeri, Tom Hughes, Paula Dhiman, Richard D. Riley, and Gary S. Collins have no conflicts of interest that are directly relevant to the content of this article.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and material

Not applicable.

Code availability

Not applicable.

Authors’ contributions

GB, PW, and GSC conceived the study idea. GB, PW, FMI, TH, RR, SK, and GC were involved in the design and planning. GB and GSC wrote the first draft of the manuscript. GB, PW, FMI, SK, TH, PD, RR, and GSC critically revised the manuscript. All authors approved the final version of the manuscript.

Appendix 1: TRIPOD Checklist: Prediction Model Development and Validation

Appendix 1: TRIPOD Checklist: Prediction Model Development and Validation

Section/topic

Item

 

Checklist item

Page

Title and abstract

Title

1

D;V

Identify the study as developing and/or validating a multivariable prediction model, the target population, and the outcome to be predicted

 

Abstract

2

D;V

Provide a summary of objectives, study design, setting, participants, sample size, predictors, outcome, statistical analysis, results, and conclusions

 

Introduction

Background and objectives

3a

D;V

Explain the medical context (including whether diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models

 

3b

D;V

Specify the objectives, including whether the study describes the development or validation of the model or both

 

Methods

Source of data

4a

D;V

Describe the study design or source of data (e.g., randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable

 

4b

D;V

Specify the key study dates, including start of accrual; end of accrual; and, if applicable, end of follow-up

 

Participants

5a

D;V

Specify key elements of the study setting (e.g., primary care, secondary care, general population) including number and location of centers

 

5b

D;V

Describe eligibility criteria for participants

 

5c

D;V

Give details of treatments received, if relevant

 

Outcome

6a

D;V

Clearly define the outcome that is predicted by the prediction model, including how and when assessed

 

6b

D;V

Report any actions to blind assessment of the outcome to be predicted

 

Predictors

7a

D;V

Clearly define all predictors used in developing or validating the multivariable prediction model, including how and when they were measured

 

7b

D;V

Report any actions to blind assessment of predictors for the outcome and other predictors

 

Sample size

8

D;V

Explain how the study size was arrived at

 

Missing data

9

D;V

Describe how missing data were handled (e.g., complete-case analysis, single imputation, multiple imputation) with details of any imputation method

 

Statistical analysis methods

10a

D

Describe how predictors were handled in the analyses

 

10b

D

Specify type of model, all model-building procedures (including any predictor selection), and method for internal validation

 

10c

V

For validation, describe how the predictions were calculated

 

10d

D;V

Specify all measures used to assess model performance and, if relevant, to compare multiple models

 

10e

V

Describe any model updating (e.g., recalibration) arising from the validation, if done

 

Risk groups

11

D;V

Provide details on how risk groups were created, if done

 

Development vs validation

12

V

For validation, identify any differences from the development data in setting, eligibility criteria, outcome, and predictors

 

Results

Participants

13a

D;V

Describe the flow of participants through the study, including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful

 

13b

D;V

Describe the characteristics of the participants (basic demographics, clinical features, available predictors), including the number of participants with missing data for predictors and outcome

 

13c

V

For validation, show a comparison with the development data of the distribution of important variables (demographics, predictors and outcome)

 

Model development

14a

D

Specify the number of participants and outcome events in each analysis

 

14b

D

If done, report the unadjusted association between each candidate predictor and outcome

 

Model specification

15a

D

Present the full prediction model to allow predictions for individuals (i.e., all regression coefficients, and model intercept or baseline survival at a given time point)

 

15b

D

Explain how to the use the prediction model

 

Model performance

16

D;V

Report performance measures (with CIs) for the prediction model

 

Model updating

17

V

If done, report the results from any model updating (i.e., model specification, model performance)

 

Discussion

Limitations

18

D;V

Discuss any limitations of the study (such as nonrepresentative sample, few events per predictor, missing data)

 

Interpretation

19a

V

For validation, discuss the results with reference to performance in the development data, and any other validation data

 

19b

D;V

Give an overall interpretation of the results, considering objectives, limitations, results from similar studies, and other relevant evidence

 

Implications

20

D;V

Discuss the potential clinical use of the model and implications for future research

 

Other information

Supplementary information

21

D;V

Provide information about the availability of supplementary resources, such as study protocol, Web calculator, and data sets

 

Funding

22

D;V

Give the source of funding and the role of the funders for the present study

 
  1. CIs confidence intervals, TRIPOD Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis
  2. Items relevant only to the development of a prediction model are denoted by D, items relating solely to a validation of a prediction model are denoted by V, and items relating to both are denoted D;V. We recommend using the TRIPOD Checklist in conjunction with the TRIPOD Explanation and Elaboration document

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bullock, G.S., Ward, P., Impellizzeri, F.M. et al. The Trade Secret Taboo: Open Science Methods are Required to Improve Prediction Models in Sports Medicine and Performance. Sports Med 53, 1841–1849 (2023). https://doi.org/10.1007/s40279-023-01849-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40279-023-01849-6

Navigation