Article Text

Download PDFPDF
Methods matter: (mostly) avoid categorising continuous data – a practical guide
  1. Zachary Orion Binney1,
  2. Mohammad Ali Mansournia2,3
  1. 1 Department of Quantitative Theory and Methods, Oxford College of Emory University, Oxford, Georgia, USA
  2. 2 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
  3. 3 Sports Medicine Research Center, Neuroscience Institute, Tehran University of Medical Sciences, Tehran, Iran (the Islamic Republic of)
  1. Correspondence to Dr Zachary Orion Binney, Quantitative Theory and Methods, Emory University Oxford College, Oxford, Georgia, USA; zbinney{at}emory.edu

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Continuous data appear everywhere in sports science and sports medicine: workload measures, range of movement, muscle strength tests, risk scores, golf shot distances, etc. The variations in these all contain useful information.

But too often that information is lost when the data are categorised. We may only be told whether an athlete’s shoulder external rotation is ‘normal’ versus ‘abnormal’; if their leg strength asymmetry on the single leg vertical jump test is >10% or <10%; if a patient’s risk score places them into a ‘high’ or ‘low’ category; or whether a given correlation coefficient or effect size is ‘strong,’ ‘moderate’ or ‘weak.’

Two often unstated assumptions are required for these methods to be valid: (1) there are no meaningful variations within categories and (2) there are meaningful differences around category cut-points. There are a limited number of situations in which both of these assumptions are reasonable in clinical research.1 2 Despite this, the unnecessary and inappropriate categorisation of continuous variables is widespread in much of the medical literature, including sports science and sports medicine.2 3

In this paper, we discuss why researchers often choose to categorise continuous data, the assumptions required, and why it is frequently an inappropriate choice. We then provide a brief example from the sports medicine literature. Finally, we discuss methods to analyse continuous variables without categorisation. We hope to dissuade researchers from categorising continuous data in the most cases.

Why do researchers categorise data?

Researchers commonly encounter continuous data as independent variables or dependent variables of a statistical model. They also encounter other continuous quantities such as correlation coefficients and effect sizes (eg, Cohen’s d), which for simplicity we will also call ‘data’ (table 1).

View this table:
Table 1

Recommendations for handling various types of continuous data

Supplemental material

[bjsports-2023-107599supp001.pdf]

The choice to categorise may be guided by the misperception that it is simpler to conduct …

View Full Text

Footnotes

  • Twitter @binney_z

  • Contributors ZOB conceived of the manuscript and provided the data; ZOB and MAM contributed ideas and drafted the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.