Skip to main content
Log in

“Repeated sampling from the same population?” A critique of Neyman and Pearson’s responses to Fisher

  • Paper in General Philosophy of Science
  • Published:
European Journal for Philosophy of Science Aims and scope Submit manuscript

Abstract

Fisher (1945a, 1945b, 1955, 1956, 1960) criticised the Neyman-Pearson approach to hypothesis testing by arguing that it relies on the assumption of “repeated sampling from the same population.” The present article considers the responses to this criticism provided by Pearson (1947) and Neyman (1977). Pearson interpreted alpha levels in relation to imaginary replications of the original test. This interpretation is appropriate when test users are sure that their replications will be equivalent to one another. However, by definition, scientific researchers do not possess sufficient knowledge about the relevant and irrelevant aspects of their tests and populations to be sure that their replications will be equivalent to one another. Pearson also interpreted the alpha level as a personal rule that guides researchers’ behavior during hypothesis testing. However, this interpretation fails to acknowledge that the same researcher may use different alpha levels in different testing situations. Addressing this problem, Neyman proposed that the average alpha level adopted by a particular researcher can be viewed as an indicator of that researcher’s typical Type I error rate. Researchers’ average alpha levels may be informative from a metascientific perspective. However, they are not useful from a scientific perspective. Scientists are more concerned with the error rates of specific tests of specific hypotheses, rather than the error rates of their colleagues. It is concluded that neither Neyman nor Pearson adequately rebutted Fisher’s “repeated sampling” criticism. Fisher’s significance testing approach is briefly considered as an alternative to the Neyman-Pearson approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

Notes

  1. The concept of an exact replication can be defined as requiring the duplication of either (a) all possible testing conditions or (b) only those testing conditions that could potentially affect the results of the study. For example, Rubin (2019) defined exact replications in the second way, as requiring “the duplication of all of the aspects of an original study that could potentially affect the results of that study.” This second definition implies that researchers are sure about which aspects of their study are relevant (i.e., “could potentially affect the results”) and which are irrelevant. Hence, it is similar to the concept of an equivalent replication that I discuss later. In the present article, I adopt the first, more common, definition of an exact replication that requires the duplication of “all possible testing conditions,” including both relevant and irrelevant conditions.

  2. Following Spanos (2006), we can distinguish between statistical and substantive adequacy. Statistical adequacy occurs when a statistical model’s assumptions (e.g., normal, independent, and identically distributed data for a simple normal model) are sufficiently consistent with the observed data. Substantive adequacy occurs when the characteristics of the statistical model, sample, and testing methodology (e.g., sampling procedure, measures, testing environment, etc.) are sufficiently consistent with a theoretical data generating process or “chance mechanism” (Neyman 1977, p. 99).

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Rubin.

Additional information

This article belongs to the Topical Collection: Philosophical Perspectives on the Replicability Crisis

Guest Editors: Mattia Andreoletti, Jan Sprenger

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rubin, M. “Repeated sampling from the same population?” A critique of Neyman and Pearson’s responses to Fisher. Euro Jnl Phil Sci 10, 42 (2020). https://doi.org/10.1007/s13194-020-00309-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13194-020-00309-6

Keywords

Navigation