-
An Undergraduate Course on the Statistical Principles of Research Study Design Am. Stat. (IF 1.8) Pub Date : 2025-5-21 Lee Kennedy-Shaffer
The undergraduate curriculum in statistics and data science is undergoing changes to accommodate new methods, newly interested students, and the changing role of statistics in society. Because of this, it is more important than ever that students understand the role of study design and how to formulate meaningful scientific and statistical research questions. While the traditional Design of Experiments
-
Zero-Truncated Modelling in a Meta-Analysis on Suicide Data after Bariatric Surgery Am. Stat. (IF 1.8) Pub Date : 2025-5-20 Layna Charlie Dennett, Antony Overstall, Dankmar Böhning
Meta-analysis is a well-established method for integrating results from several independent studies to estimate a common quantity of interest. However, meta-analysis is prone to selection bias, notably when particular studies are systematically excluded. This can lead to bias in estimating the quantity of interest. Motivated by a meta-analysis to estimate the rate of completed-suicide after bariatric
-
Flexible distributed lag models for count data using mgcv Am. Stat. (IF 1.8) Pub Date : 2025-5-18 Theo Economou, Daphne Parliari, Aurelio Tobias, Laura Dawkins, Hamish Steptoe, Christophe Sarran, Oliver Stoner, Rachel Lowe, Jos Lelieveld
In this tutorial we present the use of R package mgcv to implement Distributed Lag Non-Linear Models (DLNMs) in a flexible way. Interpretation of smoothing splines as random quantities enables approximate Bayesian inference, which in turn allows uncertainty quantification and comprehensive model checking. We illustrate various modeling situations using open-access epidemiological data in conjunction
-
The Loser’s Curse and the Critical Role of the Utility Function Am. Stat. (IF 1.8) Pub Date : 2025-5-16 Ryan S. Brill, Abraham J. Wyner
A longstanding question in the judgment and decision making literature is whether experts, even in high-stakes environments, exhibit the same cognitive biases observed in controlled experiments with inexperienced participants. Massey and Thaler (2013) claim to have found an example of bias and irrationality in expert decision making: general managers’ behavior in the National Football League draft
-
High Dimensional Space Oddity Am. Stat. (IF 1.8) Pub Date : 2025-5-15 Haim Bar, Vladimir Pozdnyakov
In his 1996 paper, Talagrand highlighted that the Law of Large Numbers (LLN) for independent random variables can be viewed as a geometric property of multidimensional product spaces. This phenomenon is known as the concentration of measure. To illustrate this profound connection between geometry and probability theory, we consider a seemingly intractable geometric problem in multidimensional Euclidean
-
Bayesian Inference and the Principle of Maximum Entropy Am. Stat. (IF 1.8) Pub Date : 2025-5-6 Duncan K. Foley, Ellis Scharfenaker
Bayes’ theorem incorporates distinct types of information through the likelihood and prior. Direct observations of state variables enter the likelihood and modify posterior probabilities through consistent updating. Information in terms of expected values of state variables modify posterior probabilities by constraining prior probabilities to be consistent with the information. Constraints on the prior
-
Much Ado About Survey Tables: A Comparison of Chi-Square Tests and Software to Analyze Categorical Survey Data Am. Stat. (IF 1.8) Pub Date : 2025-5-5 Li-Yen R. Hu, Yulei He, Katherine E. Irimata, Vladislav Beresovsky
Chi-square tests are often employed to examine the association of categorical variables, the homogeneity of proportions between two or more samples, and the goodness-of-fit for a specified distribution. To account for the complex design of survey data, variants of chi-square tests as well as software packages that implement these tests have been developed. Nevertheless, from a survey practitioner’s
-
Analytics, Have Some Humility: A Statistical View of Fourth-Down Decision Making Am. Stat. (IF 1.8) Pub Date : 2025-4-18 Ryan S. Brill, Ronald Yurko, Abraham J. Wyner
The standard mathematical approach to fourth-down decision-making in American football is to make the decision that maximizes estimated win probability. Win probability estimates arise from machine learning models fit from historical data. These models attempt to capture a nuanced relationship between a noisy binary outcome variable and game-state variables replete with interactions and non-linearities
-
Play-by-Play Volleyball Win Probability Model Am. Stat. (IF 1.8) Pub Date : 2025-4-10 Nathan Hawkins, Gilbert W. Fellingham, Garritt L. Page
This paper introduces a volleyball point-by-point win probability model that updates the probability of winning a set after each play in the set. The covariate informed product partition model (PPMx) is well suited to flexibly include in-set team performance information when making predictions. However, making predictions in real time would be too expensive computationally as it would require refitting
-
Data Science in Practice. Am. Stat. (IF 1.8) Pub Date : 2025-4-9 Xiao Hui Tai
Tom Alby. Boca Raton, FL: Chapman & Hall/CRC Press, 2024, xvi + 301 pp., $200.00(H), ISBN: 978-1-032-50524-4.This book is a comprehensive introduction to data science, with a focus on how it is use...
-
Learn R: As a Language, 2nd ed. Am. Stat. (IF 1.8) Pub Date : 2025-4-9 Haihan Yu
Pedro J. Aphalo. Boca Raton, FL: Chapman & Hall/CRC Press, 2024, xvii + 447 pp., $220.00(H), ISBN: 978-1-032-51843-5.R programming has become an essential tool for data analysis and statistical com...
-
An Example to Illustrate Randomized Trial Estimands and Estimators Am. Stat. (IF 1.8) Pub Date : 2025-4-5 Linda J. Harrison, Sean S. Brummel
Recently, the International Conference on Harmonisation finalized an estimand framework for randomized trials that was adopted by regulatory bodies worldwide. The framework introduced five strategies for handling post-randomization events; namely the treatment policy, composite variable, while on treatment, hypothetical and principal stratum estimands. We describe an illustrative example to elucidate
-
Estimation of a Generalized Treatment Effect in a Control Group Versus Treatment Group Design Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Daniel R. Jeske
A control group versus treatment group design is considered where the responses in the treatment group are modeled as a two-component mixture model that accounts for the possibility that only a fraction of the patients in the treated group will respond to the treatment. In this setting, the treatment effect is generalized to include both the fraction of treated patients that respond to the treatment
-
Counternull Sets in Randomized Experiments Am. Stat. (IF 1.8) Pub Date : 2025-4-3 M.-A. C. Bind, D. B. Rubin
Consider a study whose primary results are “not statistically significant”. How often does it lead to the following published conclusion that “there is no effect of the treatment/exposure on the outcome”? We believe too often and that the requirement to report counternull values could help to avoid this! In statistical parlance, the null value of an estimand is a value that is distinguished in some
-
Foundations of Data Science with Python Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Qing Wang
Foundations of Data Science with Python, by John M. Shea, provides a comprehensive and modern introduction of data science. The book illustrates different aspects of working with data computational...
-
Modern Data Visualization with R Am. Stat. (IF 1.8) Pub Date : 2025-4-3 John M. Hoenig
This book is available in hardcover and as downloadable chapters on the internet. The author states he wants “to provide you with the tools to both select and create graphs that present data as cle...
-
Applied Machine Learning Using mlr3 in R Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Xueying Tang
Machine learning has become an important tool in scientific research and industry, driven in part by the availability of user-friendly software for model development. While most of the notable soft...
-
A New General Class of Discrete Bivariate Distributions Constructed by the Usual Stochastic Order Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Min Ju Lee, Na Young Yoo, Ji Hwan Cha
In this paper, we develop a new general class of discrete bivariate distributions that can model the effect of the so-called ‘load-sharing configuration’. Under such load-sharing configuration, after the failure of one component, the surviving component has to shoulder extra load, which eventually results in its failure at an earlier time than what is expected under the case of independence. To model
-
Causal Inference with Complex Surveys: A Unified Perspective on Sample Selection and Exposure Selection Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Giovanni Nattino, Robert Ashmead, Bo Lu
Probability surveys are a major source of population representative data for policy research and program evaluation. However, the data come with the added complications of being observational and selected with unequal probabilities. Propensity score adjustments have become increasingly popular for inferring causal relationships in non-randomized studies, but when using survey data, estimates of the
-
Cross-Validatory Z-Residual for Diagnosing Shared Frailty Models Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Tingxuan Wu, Cindy Feng, Longhai Li
Accurate model performance assessment in survival analysis is imperative for robust predictions and informed decision-making. Traditional residual diagnostic tools like martingale and deviance residuals lack a well-characterized reference distribution for censored regression, making numerical statistical tests based on these residuals challenging. Recently, the introduction of Z-residuals for diagnosing
-
Performance Analysis of NSUM Estimators in Social-Network Topologies Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Sergio Díaz-Aranda, Jose Aguilar, Juan Marcos Ramírez, David Rabanedo, Antonio Fernández Anta, Rosa E. Lillo
The Network Scale-up Methods (NSUM) are methods to estimate unknown populations based on indirect surveys in which the participants provide information about aggregated data of their acquaintances. This preserves the privacy and may lead to higher participation. During the last thirty years, new NSUM estimators have emerged. However, conditions related to the design of the experiments and the robustness
-
A Pareto Tail Plot Without Moment Restrictions Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Bernhard Klar
We propose a mean functional that exists for arbitrary probability distributions and characterizes the Pareto distribution within the set of distributions with finite left endpoint. This is in sharp contrast to the mean excess plot, which is meaningless for distributions without an existing mean and has nonstandard behavior when the mean is finite, but the second moment does not exist. The construction
-
Sparse-Group Boosting: Unbiased Group and Variable Selection Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Fabian Obster, Christian Heumann
For grouped covariates, we propose a framework for boosting that allows for sparsity within and between groups. By using component-wise and group-wise gradient ridge boosting simultaneously with adjusted degrees of freedom or penalty parameters, a model with similar properties as the sparse-group lasso can be fitted through boosting. We show that within-group and between-group sparsity can be controlled
-
Additive Hazards Regression Analysis of Massive Interval-Censored Data via Data Splitting Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Peiyao Huang, Shuwei Li, Xinyuan Song
With the rapid development of data acquisition and storage space, massive datasets exhibited with large sample size emerge increasingly and make more advanced statistical tools urgently need. To accommodate such big volume in the analysis, a variety of methods have been proposed in the circumstances of complete or right censored survival data. However, existing development of big data methodology has
-
Selecting the Best Compositions of a Wheelchair Basketball Team: A Data-Driven Approach Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Gabriel Calvo, Carmen Armero, Bernd Grimm, Christophe Ley
Wheelchair basketball, regulated by the International Wheelchair Basketball Federation, is a sport designed for individuals with physical disabilities. This article presents a data-driven tool that effectively determines optimal team lineups based on past performance data and metrics for player effectiveness. Our proposed methodology involves combining a Bayesian longitudinal model with an integer
-
When Heavy Tails Disrupt Statistical Inference Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Richard M. Vogel, Simon Michael Papalexiou, Jonathan R. Lamontagne, Flannery C. Dolan
Heavy tails (HT) arise in many applications and their presence can disrupt statistical inference, yet the HT statistical literature requires a theoretical background most practicing statisticians lack. We provide an overview of the influence of HT on the performance of basic statistical methods and useful theorems aimed at the practitioner encountering HT in an applied setting. Higher or even lower
-
An Effective and Small Sample-Size Valid Confidence Interval for Isotonic Dose–Response Curves by Inverting a Partial Likelihood Ratio Test Am. Stat. (IF 1.8) Pub Date : 2025-4-3 J. G. Liao
A dose–response curve is essential for determining the safe dosage of a drug and is widely used in bioassay and in phase 1 clinical trials. It is generally accepted that the probability of death or the probability of dose-limiting toxicity is a nondecreasing function of the dose. This article proposes and develops an effective point-wise confidence interval for an isotonic dose–response curve, a problem
-
Estimation of Contact Time Among Animals from Telemetry Data Am. Stat. (IF 1.8) Pub Date : 2025-4-3 Andrew B. Whetten, Trevor J. Hefley, David A. Haukos
Continuous processes in most applications are measured discretely with error. This complicates the task of detecting intersections and the number of intersections between two continuous processes (i.e., when the processes have the same value). Intersections of continuous processes are scientifically important, but challenging to estimate from data. For example, in the field of animal ecology, intersections
-
An Efficient Computation Strategy for Generalized Single-Index Models and Their Variants by Integrating With GAM Am. Stat. (IF 1.8) Pub Date : 2025-4-1 Ximin Li, Haozhe Liang, Hua Liang
Various generalizations of single-index models and associated estimation methods have been developed. However, implementing these developed methods requires much effort to program, case by case, due to the lack of a common and flexible vehicle to cover them. We suggest an efficient computation strategy for easily estimating parameters and nonparametric functions in generalized single-index models and
-
Closed-Form Power and Sample Size Calculations for Bayes Factors Am. Stat. (IF 1.8) Pub Date : 2025-4-1 Samuel Pawel, Leonhard Held
Determining an appropriate sample size is a critical element of study design, and the method used to determine it should be consistent with the planned analysis. When the planned analysis involves Bayes factor hypothesis testing, the sample size is usually desired to ensure a sufficiently high probability of obtaining a Bayes factor indicating compelling evidence for a hypothesis, given that the hypothesis
-
Nonparametric Statistical Methods Using R, 2nd ed. Am. Stat. (IF 1.8) Pub Date : 2025-3-27 Bojana Milošević
John Kloke and Joseph McKean. Boca Raton, FL: Chapman & Hall/CRC Press, 2024, xvii + 447 pp., $91.99(H), ISBN: 978-0-367-65135-0.The second edition of Nonparametric Statistical Methods Using R subs...
-
Analyzing Spatial Point Patterns in Digital Pathology: Immune Cells in High-Grade Serous Ovarian Carcinomas Am. Stat. (IF 1.8) Pub Date : 2025-3-25 Jonatan A. González, Julia Wrobel, Simon Vandekar, Paula Moraga
Multiplex immunofluorescence (mIF) imaging technology facilitates the study of the tumor microenvironment in cancer patients. Due to the capabilities of this emerging bioimaging technique, it is possible to statistically analyze, for example, the co-varying location and functions of multiple different types of immune cells. Complex spatial relationships between different immune cells have been shown
-
Connections between Statistics and Mathematics/Probability Am. Stat. (IF 1.8) Pub Date : 2025-3-10 Michael A. Proschan, Pamela A. Shaw
There are many connections between probability, other mathematics courses, and statistics. Understanding these connections provides insights that might not be fully appreciated when considering each discipline in isolation. While the typical instruction of statistics courses relies on elucidating its foundational principles from mathematical and probability theory, it is generally less appreciated
-
A Class of Regression Association Measures based on Concordance Am. Stat. (IF 1.8) Pub Date : 2025-2-25 Jia-Han Shih, Yi-Hau Chen
Measures of regression association aiming at predictability of a dependent variable Y from an independent variable X have received considerable attention recently. In this article, we provide a unified discussion of some existing measures, including their rationale, properties, and estimation. Motivated by these measures, we consider a general class of regression association measures which views the
-
A Multiple Imputation Approach for the Cumulative Incidence, with Implications for Variance Estimation Am. Stat. (IF 1.8) Pub Date : 2025-2-24 Elizabeth C. Chase, Philip S. Boonstra, Jeremy M. G. Taylor
We present an alternative approach to estimating the cumulative incidence function that uses nonparametric multiple imputation to reduce the problem to that of estimating a binomial proportion. In the standard competing risks setting, we show mathematically and empirically that our imputation-based estimator is equivalent to the Aalen-Johansen estimator of the cumulative incidence given a sufficient
-
Laplace’s Law of Succession Estimator and M-Statistics Am. Stat. (IF 1.8) Pub Date : 2025-2-24 Eugene Demidenko
The classic formula for estimating the binomial probability as the proportion of successes contradicts common sense for extreme probabilities when the event never occurs or occurs every time. Laplace’s law of succession estimator, one of the first applications of Bayesian statistics, has been around for over 250 years and resolves the paradoxes, although rarely discussed in modern statistics texts
-
Tightening Blocks in Complementary Analyses of Observational Studies: Optimization Algorithm and Examples Am. Stat. (IF 1.8) Pub Date : 2024-08-20 Paul R. Rosenbaum
An observational block design has I blocks matched for covariates and J individuals per block, but treatments were not randomly assigned to individuals within blocks, as would have been done in an ...
-
Using Exact Tests from Algebraic Statistics in Sparse Multi-way Analyses: An Application to Analyzing Differential Item Functioning Am. Stat. (IF 1.8) Pub Date : 2024-08-12 Shishir Agrawal, Luis David Garcia Puente, Minho Kim, Flavia Sancier-Barbosa
Asymptotic goodness-of-fit methods in contingency table analysis can struggle with sparse data, especially in multi-way tables where it can be infeasible to meet sample size requirements for a robu...
-
Distance Covariance, Independence, and Pairwise Differences Am. Stat. (IF 1.8) Pub Date : 2024-07-03 Jakob Raymaekers, Peter J. Rousseeuw
Distance covariance (Székely et al. 2007) is a fascinating recent notion, which is popular as a test for dependence of any type between random variables X and Y. This approach deserves to be touche...
-
A Multi-Method Data Science Pipeline for Analyzing Police Service Am. Stat. (IF 1.8) Pub Date : 2024-07-01 Anna Haensch, Daanika Gordon, Karin Knudson, Justina Cheng
Despite the fact that most police departments in the U.S. serve jurisdictions with fewer than 10,000 residents, policing practices in small towns are understudied. This is due in part to data limit...
-
High-dimensional propensity score and its machine learning extensions in residual confounding control Am. Stat. (IF 1.8) Pub Date : 2024-06-17 Mohammad Ehsanul Karim
“The use of health care claims datasets often encounters criticism due to the pervasive issues of omitted variables and inaccuracies or mis-measurements in available confounders. Ultimately, the tr...
-
Integrative data analysis where partial covariates have complex non-linear effects by using summary information from an external data Am. Stat. (IF 1.8) Pub Date : 2024-06-17 Jia Liang, Shuo Chen, Peter Kochunov, L. Elliot Hong, Chixiang Chen
A full parametric and linear specification may be insufficient to capture complicated patterns in studies exploring complex features, such as those investigating age-related changes in brain functi...
-
On Misuses of the Kolmogorov–Smirnov Test for One-Sample Goodness-of-Fit Am. Stat. (IF 1.8) Pub Date : 2024-06-12 Anthony Zeimbekakis, Elizabeth D. Schifano, Jun Yan
The Kolmogorov–Smirnov (KS) test is widely employed to assess the goodness-of-fit of a hypothesized continuous distribution to a sample. Despite its popularity, the test is frequently misused in th...
-
Assessment and Continuous Improvement of an Undergraduate Data Science Program Am. Stat. (IF 1.8) Pub Date : 2024-06-07 Nicholas Clark, Christopher Morrell, Mike Powell
In recent years, there has been an explosion in the growth of undergraduate statistics and data science programs across the US. Simultaneously, there has been clear guidance written on curriculum d...
-
Analyzing Matched 2 × 2 Tables from all Corners Am. Stat. (IF 1.8) Pub Date : 2024-05-31 Marc Aerts, Geert Molenberghs
Squared 2 × 2 tables with binary data from matched pairs are typically analyzed using Cochran-Mantel-Haenszel methodology, conditional logistic regression, or random intercepts logistic regression....
-
Sequential monitoring using the Second Generation P-Value with Type I error controlled by monitoring frequency Am. Stat. (IF 1.8) Pub Date : 2024-05-28 Jonathan J. Chipman, Robert A. Greevy Jr., Lindsay Mayberry, Jeffrey D. Blume
The Second Generation P-Value (SGPV) measures the overlap between an estimated interval and a composite hypothesis of parameter values. We develop a sequential monitoring scheme of the SGPV (SeqSGP...
-
Binomial Confidence Intervals for Rare Events: Importance of Defining Margin of Error Relative to Magnitude of Proportion Am. Stat. (IF 1.8) Pub Date : 2024-05-24 Owen McGrath, Kevin Burke
Confidence interval performance is typically assessed in terms of two criteria: coverage probability and interval width (or margin of error). In this article, we assess the performance of four comm...
-
The R2D2 prior for generalized linear mixed models Am. Stat. (IF 1.8) Pub Date : 2024-05-09 Eric Yanchenko, Howard D. Bondell, Brian J. Reich
In Bayesian analysis, the selection of a prior distribution is typically done by considering each parameter in the model. While this can be convenient, in many scenarios it may be desirable to plac...
-
Boldness-Recalibration for Binary Event Predictions Am. Stat. (IF 1.8) Pub Date : 2024-05-13 Adeline P. Guthrie, Christopher T. Franck
Probability predictions are essential to inform decision making across many fields. Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, that is, spread out enou...
-
The Best Time to Play the Lottery Am. Stat. (IF 1.8) Pub Date : 2024-05-07 Christopher M. Rump
The best time to play the lottery is when the jackpot has rolled over several times and grown large, but not so large that you must share the prize if you win. We examine maximizing the expected va...
-
A Simple and Fast Algorithm for Generating Correlation Matrices with a Known Average Correlation Coefficient Am. Stat. (IF 1.8) Pub Date : 2024-05-02 Niels G. Waller
This article describes a simple and fast algorithm for generating correlation matrices ( R) with a known average correlation. The algorithm should be useful for researchers desiring plausible R m...
-
Tractable Bayesian Inference For An Unidentified Simple Linear Regression Model Am. Stat. (IF 1.8) Pub Date : 2024-04-24 Robert Calvert Jump
In this article, I propose a tractable approach to Bayesian inference in a simple linear regression model for which the standard exogeneity assumption does not hold. By specifying a beta prior for ...
-
Telling Stories with Data: With Applications in R Am. Stat. (IF 1.8) Pub Date : 2024-04-23 Piotr Fryzlewicz
Published in The American Statistician (Vol. 78, No. 4, 2024)
-
Deep Learning and Scientific Computing with R torch Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Yang Ni
Published in The American Statistician (Vol. 78, No. 2, 2024)
-
An Introduction to R and Python for Data Analysis: A Side-by-Side Approach. Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Gabriel Wallin
Published in The American Statistician (Vol. 78, No. 2, 2024)
-
On Point Estimators for Gamma and Beta Distributions Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Nickos D. Papadatos
Let X1,…,Xn be a random sample from the gamma distribution with density f(x)=λαxα−1e−λx/Γ(α), x > 0, where both α>0 (the shape parameter) and λ>0 (the reciprocal scale parameter) are unknown. The m...
-
Introduction to Statistical Modelling and Inference Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Nianpin Cheng, Beth Chance
Published in The American Statistician (Vol. 78, No. 3, 2024)
-
Statistical Theory: A Concise Introduction, 2nd ed. Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Juan Sosa
Published in The American Statistician (Vol. 78, No. 3, 2024)
-
Moments of the Nonnegative Adjusted Estimator of Squared Multiple Correlation Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Joseph F. Lucke
I present the moments of the nonnegative adjusted estimator of the squared multiple correlation ρ2, the coefficient of determination for random-predictor regression. This estimator, first proposed ...
-
Thick Data Analytics (TDA): An Iterative and Inductive Framework for Algorithmic Improvement Am. Stat. (IF 1.8) Pub Date : 2024-04-15 Minh Nguyen, Tiffany Eulalio, Ben J. Marafino, Christian Rose, Jonathan H. Chen, Michael Baiocchi
A gap remains between developing risk prediction models and deploying models to support real-world decision making, especially in high-stakes situations. Human-experts’ reasoning abilities remain c...