Abstract
Methods for designing a comparable replication study have received considerable attention in the published literature, with both Bayesian and non-Bayesian methods having been developed from a hypothesis testing and associated P-value perspective. The purpose of this paper is to describe, using a maximum likelihood-based confidence interval framework, a new frequentist method for choosing the sample size for a comparable replication study. This new method is compared to the published “predictive power” (or “PP”) method. For each of these two methods, a new and easy-to-use formula is derived for computing the optimal comparable replication study sample size that guarantees satisfying a specific confidence interval criterion with a chosen high minimum probability. Connections to hypothesis testing are made, and the Discussion section provides further commentary and considers a numerical example involving published data.
Similar content being viewed by others
Data Availability
Not applicable.
Code Availability
Not applicable.
References
Abeler J et al (2011) Reference points and effort provision. Am Econ Rev 101:470–492
Bonett DG (2021) Design and analysis of replication studies. Organ Res Methods 24(3):513–529
Branch MN (2014) Malignant side effects of null hypothesis significance testing. Theory Psychol 24:256–277
Camerer CF et al (2016) Evaluating replicability of laboratory experiments in economics. Science 351:1433–1436
Cumming G (2008) Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspect Psychol Sci 3(4):286–300
Errington TM et al (2021) Reproducibility in cancer biology: challenges for assessing replicability in preclinical cancer biology. eLife 10:e67995. https://doi.org/10.7554/eLife.67995
Held L (2020) A new standard for the analysis and design of replication studies. J R Stat Soc A 2:431–448
Held L, Pawel S, Schwab S (2020) Replication power and regression to the mean. Significance 17(6):10–11
Lash TL (2017) The harm done to reproducibility by the culture of null hypothesis significance testing. Am J Epidemiol 186(6):627–635
Miller J (2009) What is the probability of replicating a statistically significant effect? Psychon Bull Rev 16(4):617–640
Patil P, Peng RD, Leek JT (2016) What should we expect when we replicate? A statistical view of replicability in psychological science. Perspect Psychol Sci 11(4):539–544
Shrout PE, Rodgers JL (2018) Psychology, science, and knowledge construction: broadening perspectives of the replication crisis. Ann Rev Psychol 69:487–510
Wasserman RL, Lazar NA (2016) The ASA statement on p-values: context, process, and purpose. Am Stat 70:129–133
Yaffe J (2019) From the editor-do we have a replication crisis in social work research? J Soc Work Educ 55(1):1–4
Acknowledgements
We want to acknowledge helpful email exchanges with Professor Leonhard Held and Ms. Charlotte Micheloud, Epidemiology, Biostatistics and Prevention Institute, Center for Reproducible Science, University of Zurich; and, we want to thank two journal referees for some very good suggestions that greatly improved the paper.
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kupper, L.L., Martin, S.L. Replication study design: confidence intervals and commentary. Stat Papers 63, 1577–1583 (2022). https://doi.org/10.1007/s00362-022-01291-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-022-01291-2