Abstract
The author’s recent research papers, “Cumulative deviation of a subpopulation from the full population” and “A graphical method of cumulative differences between two subpopulations” (both published in volume 8 of Springer’s open-access Journal of Big Data during 2021), propose graphical methods and summary statistics, without extensively calibrating formal significance tests. The summary metrics and methods can measure the calibration of probabilistic predictions and can assess differences in responses between a subpopulation and the full population while controlling for a covariate or score via conditioning on it. These recently published papers construct significance tests based on the scalar summary statistics, but only sketch how to calibrate the attained significance levels (also known as “P-values”) for the tests. The present article reviews and synthesizes work spanning many decades in order to detail how to calibrate the P-values. The present paper presents computationally efficient, easily implemented numerical methods for evaluating properly calibrated P-values, together with rigorous mathematical proofs guaranteeing their accuracy, and illustrates and validates the methods with open-source software and numerical examples.
Article PDF
Similar content being viewed by others
Data availability
The data sets generated during and/or analyzed during the current study are available in the following repositories: (1) https://github.com/facebookresearch/cdeets (for all synthetic data sets) and (2) https://www2.census.gov/programs-surveys/acs/data/pums/2019/1-Year (for California households file csv_hca.zip—which includes the file psam_h06.csv that our software processes—from the American Community Survey of the US Census Bureau); MIT-licensed open-source codes in Python 3 and shell scripts that automatically reproduce all figures and statistics of the present paper are publicly available in the repository cdeets at https://github.com/facebookresearch/cdeets
References
Tygert M.: Cumulative deviation of a subpopulation from the full population. J Big Data 8(117), 1–60 (2021b). https://arxiv.org/abs/2008.01779
Tygert M.: A graphical method of cumulative differences between two subpopulations. J Big Data 8(158), 1–29 (2021c). https://arxiv.org/abs/2108.02666
Kloumann I, Korevaar H, McConnell C, Tygert M, Zhao J.: Cumulative differences between paired samples. Tech. Rep. 2305.11323 (2023). arXiv: https://arxiv.org/abs/2305.11323
Tygert M.: Controlling for multiple covariates. Tech. Rep. 2112.00672 (2021a). arXiv: https://arxiv.org/abs/2112.00672
Arrieta-Ibarra I, Gujral P, Tannen J, Tygert M, Xu C.: Metrics of calibration for probabilistic predictions. J Mach Learn Res 23, 1–54 (2022). https://arxiv.org/abs/2205.09680
Lee D, Huang X, Hassani H, Dobriban E (2022) T-Cal: an optimal test for the calibration of predictive models. Tech. Rep. 2203.01850. arXiv
Delgado, M.A.: Testing the equality of nonparametric regression curves. Stat Probab Lett 17(3), 199–204 (1993)
Diebolt, J.: A nonparametric test for the regression function: asymptotic theory. J Stat Plan Inference 44(1), 1–17 (1995)
Stute, W.: Nonparametric model checks for regression. Ann Stat 25(2), 613–641 (1997)
Kuiper, N.H.: Tests concerning random points on a circle. Proc Koninklijke Nederlandse Akademie van Wetenschappen Series A 63, 38–47 (1962)
Kolmogorov, A.N.: Sulla determinazione empirica di una legge di distribuzione (On the empirical determination of a distribution function). Giorn Ist Ital Attuar 4, 83–91 (1933)
Smirnov, N.: On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bulletin Mathématique de l’Université de Moscou 2(2), 3–11 (1939)
Feller, W.: The asymptotic distribution of the range of sums of independent random variables. Ann Math Stat 22(3), 427–432 (1951)
Darling, D.A., Siegert, A.J.F.: The first passage problem for a continuous Markov process. Ann Math Stat 24(4), 624–639 (1953)
Ciesielski, Z., Taylor, S.J.: First passage times and sojourn times for Brownian motion in space and the exact Hausdorff measure of the sample path. Trans Am Math Soc 103(3), 434–450 (1962)
Masoliver J.: Extreme values and the level-crossing problem: an application to the Feller process. Phys Rev E 89(4), 042106 (2014)
Acknowledgements
We would like to thank Kamalika Chaudhuri, Imanol Arrieta Ibarra, Michael Rabbat, Jonathan Tannen, Susan Zhang, and the anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Meta Platforms, Inc. employs the author. The author receives a salary and stock from Meta.
Additional information
Communicated by: Akil Narayan
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tygert, M. Calibration of P-values for calibration and for deviation of a subpopulation from the full population. Adv Comput Math 49, 70 (2023). https://doi.org/10.1007/s10444-023-10068-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10444-023-10068-6