-
Tightening Blocks in Complementary Analyses of Observational Studies: Optimization Algorithm and Examples Am. Stat. (IF 1.8) Pub Date : 2024-08-20 Paul R. Rosenbaum
An observational block design has I blocks matched for covariates and J individuals per block, but treatments were not randomly assigned to individuals within blocks, as would have been done in an ...
-
Using Exact Tests from Algebraic Statistics in Sparse Multi-way Analyses: An Application to Analyzing Differential Item Functioning Am. Stat. (IF 1.8) Pub Date : 2024-08-12 Shishir Agrawal, Luis David Garcia Puente, Minho Kim, Flavia Sancier-Barbosa
Asymptotic goodness-of-fit methods in contingency table analysis can struggle with sparse data, especially in multi-way tables where it can be infeasible to meet sample size requirements for a robu...
-
Distance Covariance, Independence, and Pairwise Differences Am. Stat. (IF 1.8) Pub Date : 2024-07-03 Jakob Raymaekers, Peter J. Rousseeuw
Distance covariance (Székely et al. 2007) is a fascinating recent notion, which is popular as a test for dependence of any type between random variables X and Y. This approach deserves to be touche...
-
A Multi-Method Data Science Pipeline for Analyzing Police Service Am. Stat. (IF 1.8) Pub Date : 2024-07-01 Anna Haensch, Daanika Gordon, Karin Knudson, Justina Cheng
Despite the fact that most police departments in the U.S. serve jurisdictions with fewer than 10,000 residents, policing practices in small towns are understudied. This is due in part to data limit...
-
High-dimensional propensity score and its machine learning extensions in residual confounding control Am. Stat. (IF 1.8) Pub Date : 2024-06-17 Mohammad Ehsanul Karim
“The use of health care claims datasets often encounters criticism due to the pervasive issues of omitted variables and inaccuracies or mis-measurements in available confounders. Ultimately, the tr...
-
Integrative data analysis where partial covariates have complex non-linear effects by using summary information from an external data Am. Stat. (IF 1.8) Pub Date : 2024-06-17 Jia Liang, Shuo Chen, Peter Kochunov, L. Elliot Hong, Chixiang Chen
A full parametric and linear specification may be insufficient to capture complicated patterns in studies exploring complex features, such as those investigating age-related changes in brain functi...
-
Assessment and Continuous Improvement of an Undergraduate Data Science Program Am. Stat. (IF 1.8) Pub Date : 2024-06-07 Nicholas Clark, Christopher Morrell, Mike Powell
In recent years, there has been an explosion in the growth of undergraduate statistics and data science programs across the US. Simultaneously, there has been clear guidance written on curriculum d...
-
Sequential monitoring using the Second Generation P-Value with Type I error controlled by monitoring frequency Am. Stat. (IF 1.8) Pub Date : 2024-05-28 Jonathan J. Chipman, Robert A. Greevy Jr., Lindsay Mayberry, Jeffrey D. Blume
The Second Generation P-Value (SGPV) measures the overlap between an estimated interval and a composite hypothesis of parameter values. We develop a sequential monitoring scheme of the SGPV (SeqSGP...
-
On Misuses of the Kolmogorov–Smirnov Test for One-Sample Goodness-of-Fit Am. Stat. (IF 1.8) Pub Date : 2024-05-20 Anthony Zeimbekakis, Elizabeth D. Schifano, Jun Yan
The Kolmogorov–Smirnov (KS) test is widely employed to assess the goodness-of-fit of a hypothesized continuous distribution to a sample. Despite its popularity, the test is frequently misused in th...
-
The R2D2 prior for generalized linear mixed models Am. Stat. (IF 1.8) Pub Date : 2024-05-09 Eric Yanchenko, Howard D. Bondell, Brian J. Reich
In Bayesian analysis, the selection of a prior distribution is typically done by considering each parameter in the model. While this can be convenient, in many scenarios it may be desirable to plac...
-
The Best Time to Play the Lottery Am. Stat. (IF 1.8) Pub Date : 2024-05-07 Christopher M. Rump
The best time to play the lottery is when the jackpot has rolled over several times and grown large, but not so large that you must share the prize if you win. We examine maximizing the expected va...
-
A Simple and Fast Algorithm for Generating Correlation Matrices with a Known Average Correlation Coefficient Am. Stat. (IF 1.8) Pub Date : 2024-05-02 Niels G. Waller
This article describes a simple and fast algorithm for generating correlation matrices ( R) with a known average correlation. The algorithm should be useful for researchers desiring plausible R m...
-
Binomial Confidence Intervals for Rare Events: Importance of Defining Margin of Error Relative to Magnitude of Proportion Am. Stat. (IF 1.8) Pub Date : 2024-05-02 Owen McGrath, Kevin Burke
Confidence interval performance is typically assessed in terms of two criteria: coverage probability and interval width (or margin of error). In this paper, we assess the performance of four common...
-
Analyzing Matched 2 × 2 Tables from all Corners Am. Stat. (IF 1.8) Pub Date : 2024-05-02 Marc Aerts, Geert Molenberghs
Squared 2 × 2 tables with binary data from matched pairs are typically analysed using Cochran-Mantel-Haenszel methodology, conditional logistic regression, or random intercepts logistic regression....
-
Telling Stories with Data: With Applications in R Am. Stat. (IF 1.8) Pub Date : 2024-04-23 Piotr Fryzlewicz
Published in The American Statistician (Ahead of Print, 2024)
-
Deep Learning and Scientific Computing with R torch Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Yang Ni
Published in The American Statistician (Vol. 78, No. 2, 2024)
-
An Introduction to R and Python for Data Analysis: A Side-by-Side Approach. Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Gabriel Wallin
Published in The American Statistician (Vol. 78, No. 2, 2024)
-
On Point Estimators for Gamma and Beta Distributions Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Nickos D. Papadatos
Let X1,…,Xn be a random sample from the gamma distribution with density f(x)=λαxα−1e−λx/Γ(α), x > 0, where both α>0 (the shape parameter) and λ>0 (the reciprocal scale parameter) are unknown. The m...
-
Introduction to Statistical Modelling and Inference Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Nianpin Cheng, Beth Chance
Published in The American Statistician (Vol. 78, No. 3, 2024)
-
Statistical Theory: A Concise Introduction, 2nd ed. Am. Stat. (IF 1.8) Pub Date : 2024-04-17 Juan Sosa
Published in The American Statistician (Vol. 78, No. 3, 2024)
-
Boldness-Recalibration for Binary Event Predictions Am. Stat. (IF 1.8) Pub Date : 2024-04-04 Adeline P. Guthrie, Christopher T. Franck
Probability predictions are essential to inform decision making across many fields. Ideally, probability predictions are (i) well calibrated, (ii) accurate, and (iii) bold, i.e., spread out enough ...
-
Tractable Bayesian inference for an unidentified simple linear regression model Am. Stat. (IF 1.8) Pub Date : 2024-03-26 Robert Calvert Jump
In this paper, I propose a tractable approach to Bayesian inference in a simple linear regression model for which the standard exogeneity assumption does not hold. By specifying a beta prior for th...
-
Moments of the Nonnegative Adjusted Estimator of Squared Multiple Correlation Am. Stat. (IF 1.8) Pub Date : 2024-03-18 Joseph F. Lucke
I present the moments of the nonnegative adjusted estimator of the squared multiple correlation ρ2, the coefficient of determination for random-predictor regression. This estimator, first proposed...
-
On the Term “Randomization Test” Am. Stat. (IF 1.8) Pub Date : 2024-03-18 Jesse Hemerik
There is no consensus on the meaning of the term “randomization test.” Contradictory uses of the term are leading to confusion, misunderstandings and indeed invalid data analyses. A main source of ...
-
Covariance Matrix Estimation for High-Throughput Biomedical Data with Interconnected Communities Am. Stat. (IF 1.8) Pub Date : 2024-03-11 Yifan Yang, Chixiang Chen, Shuo Chen
Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, am...
-
Thick Data Analytics (TDA): An Iterative and Inductive Framework for Algorithmic Improvement Am. Stat. (IF 1.8) Pub Date : 2024-03-11 Minh Nguyen, Tiffany Eulalio, Ben Marafino, Christian Rose, Jonathan H. Chen, Michael Baiocchi
A gap remains between developing risk prediction models and deploying models to support real-world decision making, especially in high-stakes situations. Human-experts’ reasoning abilities remain c...
-
Parole Board Decision-Making using Adversarial Risk Analysis Am. Stat. (IF 1.8) Pub Date : 2024-02-13 Chaitanya Joshi, Charné Nel, Javier Cano, Devon L.L. Polaschek
Adversarial Risk Analysis (ARA) allows for much more realistic modeling of game theoretical decision problems than Bayesian game theory. While ARA solutions for various applications have been discu...
-
Fitting log-Gaussian Cox processes using generalized additive model software Am. Stat. (IF 1.8) Pub Date : 2024-02-08 Elliot Dovers, Jakub Stoklosa, David I. Warton
While log-Gaussian Cox process regression models are useful tools for modeling point patterns, they can be technically difficult to fit and require users to learn/adopt bespoke software. We show th...
-
Prioritizing Variables for Observational Study Design using the Joint Variable Importance Plot Am. Stat. (IF 1.8) Pub Date : 2024-02-08 Lauren D. Liao, Yeyi Zhu, Amanda L. Ngo, Rana F. Chehab, Samuel D. Pimentel
Observational studies of treatment effects require adjustment for confounding variables. However, causal inference methods typically cannot deliver perfect adjustment on all measured baseline varia...
-
Hidden Markov Models for Low-Frequency Earthquake Recurrence Am. Stat. (IF 1.8) Pub Date : 2024-02-05 Jessica Allen, Ting Wang
Low-frequency earthquakes (LFEs) are small magnitude earthquakes with frequencies of 1–10 Hertz which often occur in overlapping sequence forming persistent seismic tremors. They provide insights i...
-
Applied Linear Regression for Longitudinal Data: With an Emphasis on Missing Observations Am. Stat. (IF 1.8) Pub Date : 2024-02-05 Maria Francesca Marino
Published in The American Statistician (Vol. 78, No. 1, 2024)
-
Proximal MCMC for Bayesian Inference of Constrained and Regularized Estimation Am. Stat. (IF 1.8) Pub Date : 2024-01-23 Xinkai Zhou, Qiang Heng, Eric C. Chi, Hua Zhou
This paper advocates proximal Markov Chain Monte Carlo (ProxMCMC) as a flexible and general Bayesian inference framework for constrained or regularized estimation. Originally introduced in the Baye...
-
Hitting a Prime by Rolling a Die with Infinitely Many Faces Am. Stat. (IF 1.8) Pub Date : 2024-01-05 Shane Chern
Alon and Malinovsky recently proved that it takes on average 2.42849… rolls of fair six-sided dice until the first time the total sum of all rolls arrives at a prime. Naturally, one may extend the ...
-
Understanding the Implications of a Complete Case Analysis for Regression Models with a Right-Censored Covariate Am. Stat. (IF 1.8) Pub Date : 2023-12-21 Marissa C. Ashner, Tanya P. Garcia
Despite its drawbacks, the complete case analysis is commonly used in regression models with incomplete covariates. Understanding when the complete case analysis will lead to consistent parameter e...
-
Using Conformal Win Probability to Predict the Winners of the Canceled 2020 NCAA Basketball Tournaments Am. Stat. (IF 1.8) Pub Date : 2023-12-21 Chancellor Johnstone, Dan Nettleton
The COVID-19 pandemic was responsible for the cancellation of both the men’s and women’s 2020 National Collegiate Athletic Association (NCAA) Division I basketball tournaments. Starting from the po...
-
Lessons from a Discussion-Based Course on the History of Statistics Am. Stat. (IF 1.8) Pub Date : 2023-12-21 David B. Hitchcock
A special-topics undergraduate course about the history of statistics which was taught in Spring 2023 at the University of South Carolina is described. We review other similar courses (past and cur...
-
One-Step Weighting to Generalize and Transport Treatment Effect Estimates to a Target Population Am. Stat. (IF 1.8) Pub Date : 2023-12-11 Ambarish Chattopadhyay, Eric R. Cohn, José R. Zubizarreta
The problems of generalization and transportation of treatment effect estimates from a study sample to a target population are central to empirical research and statistical methodology. In both ran...
-
The Phistogram Am. Stat. (IF 1.8) Pub Date : 2023-11-17 Adriana Verónica Blanc
This article introduces a new kind of histogram-based representation for univariate random variables, named the phistogram because of its perceptual qualities. The technique relies on shifted group...
-
Missing Data Imputation with High-Dimensional Data Am. Stat. (IF 1.8) Pub Date : 2023-11-17 Alberto Brini, Edwin R. van den Heuvel
Imputation of missing data in high-dimensional datasets with more variables P than samples N, P≫N, is hampered by the data dimensionality. For multivariate imputation, the covariance matrix is ill ...
-
Technical Validation of Plot Designs by Use of Deep Learning Am. Stat. (IF 1.8) Pub Date : 2023-11-16 Anne Helby Petersen, Claus Ekstrøm
When does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visualizations are commonly used for various tasks in statistics—including model d...
-
A Note on Monte Carlo Integration in High Dimensions Am. Stat. (IF 1.8) Pub Date : 2023-11-16 Yanbo Tang
Monte Carlo integration is a commonly used technique to compute intractable integrals and is typically thought to perform poorly for very high-dimensional integrals. To show that this is not always...
-
Causal Quartets: Different Ways to Attain the Same Average Treatment Effect Am. Stat. (IF 1.8) Pub Date : 2023-11-15 Andrew Gelman, Jessica Hullman, Lauren Kennedy
The average causal effect can often be best understood in the context of its variation. We demonstrate with two sets of four graphs, all of which represent the same average effect but with much dif...
-
Bayesian Modeling and Computation in Python Am. Stat. (IF 1.8) Pub Date : 2023-10-31 P. Richard Hahn
Published in The American Statistician (Vol. 77, No. 4, 2023)
-
A First Course in Linear Model Theory, 2nd ed. Am. Stat. (IF 1.8) Pub Date : 2023-10-31 Carlos Cinelli
Published in The American Statistician (Vol. 77, No. 4, 2023)
-
ANOVA and Mixed Models: A Short Introduction Using R Am. Stat. (IF 1.8) Pub Date : 2023-10-31 Brady T. West
Published in The American Statistician (Vol. 77, No. 4, 2023)
-
Comment on “Forbidden knowledge and specialized training: A versatile solution for the two main sources of overfitting in linear regression,” by Rohlfs (2023) Am. Stat. (IF 1.8) Pub Date : 2023-10-30 Ronald Christensen
Published in The American Statistician (Just accepted, 2023)
-
The Application of the Likelihood Ratio Test and the Cochran-Mantel-Haenszel Test to Discrimination Cases Am. Stat. (IF 1.8) Pub Date : 2023-10-20 Weiwen Miao, Joseph L. Gastwirth
In practice, the ultimate outcome of many important discrimination cases, for example, the Wal-Mart, Nike and Goldman-Sachs equal pay cases, is determined at the stage when the plaintiffs request t...
-
Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing Methodology Am. Stat. (IF 1.8) Pub Date : 2023-10-18 Nicholas Larsen, Jonathan Stallrich, Srijan Sengupta, Alex Deng, Ron Kohavi, Nathaniel T. Stevens
The rise of internet-based services and products in the late 1990s brought about an unprecedented opportunity for online businesses to engage in large scale data-driven decision making. Over the pa...
-
Melded Confidence Intervals Do Not Provide Guaranteed Coverage Am. Stat. (IF 1.8) Pub Date : 2023-10-16 Jesse Frey, Yimin Zhang
Melded confidence intervals were proposed as a way to combine two independent one-sample confidence intervals to obtain a two-sample confidence interval for a quantity like a difference or a ratio....
-
Bayesian Detection of Bias in Peremptory Challenges Using Historical Strike Data Am. Stat. (IF 1.8) Pub Date : 2023-10-02 Sachin S. Pandya, Xiaomeng Li, Eric Barón, Timothy E. Moore
United States law bars using peremptory strikes during jury selection because of prospective juror race, ethnicity, sex, or membership in certain other cognizable classes. Here, we extend a Bayesia...
-
Differentially Private Methods for Releasing Results of Stability Analyses Am. Stat. (IF 1.8) Pub Date : 2023-09-27 Chengxin Yang, Jerome P. Reiter
Data stewards and analysts can promote transparent and trustworthy science and policy-making by facilitating assessments of the sensitivity of published results to alternate analysis choices. For e...
-
Multiple-Model-based Robust Estimation of Causal Treatment Effect on a Binary Outcome with Integrated Information from Secondary Outcomes Am. Stat. (IF 1.8) Pub Date : 2023-09-22 Chixiang Chen, Shuo Chen, Qi Long, Sudeshna Das, Ming Wang
An assessment of the causal treatment effect in the development and progression of certain diseases is important in clinical trials and biomedical studies. However, it is not possible to infer a ca...
-
Bivariate Analysis of Distribution Functions Under Biased Sampling Am. Stat. (IF 1.8) Pub Date : 2023-09-22 Hsin-wen Chang, Shu-Hsiang Wang
This article compares distribution functions among pairs of locations in their domains, in contrast to the typical approach of univariate comparison across individual locations. This bivariate appr...
-
Counting the Unseen: Estimation of Susceptibility Proportions in Zero-Inflated Models Using a Conditional Likelihood Approach Am. Stat. (IF 1.8) Pub Date : 2023-09-22 Wen-Han Hwang, Lu-Fang Chen, Jakub Stoklosa
Zero-inflated count data models are widely used in various fields such as ecology, epidemiology, and transportation, where count data with a large proportion of zeros is prevalent. Despite their wi...
-
Enhanced Inference for Finite Population Sampling-Based Prevalence Estimation with Misclassification Errors Am. Stat. (IF 1.8) Pub Date : 2023-09-21 Lin Ge, Yuzi Zhang, Lance A. Waller, Robert H. Lyles
Epidemiologic screening programs often make use of tests with small, but nonzero probabilities of misdiagnosis. In this article, we assume the target population is finite with a fixed number of tru...
-
Here Comes the STRAIN: Analyzing Defensive Pass Rush in American Football with Player Tracking Data Am. Stat. (IF 1.8) Pub Date : 2023-09-14 Quang Nguyen, Ronald Yurko, Gregory J. Matthews
In American football, a pass rush is an attempt by the defensive team to disrupt the offense and prevent the quarterback (QB) from completing a pass. Existing metrics for assessing pass rush perfor...
-
Likelihood-Free Parameter Estimation with Neural Bayes Estimators Am. Stat. (IF 1.8) Pub Date : 2023-08-17 Matthew Sainsbury-Dale, Andrew Zammit-Mangion, Raphaël Huser
Neural Bayes estimators are neural networks that approximate Bayes estimators. They are fast, likelihood-free, and amenable to rapid bootstrap-based uncertainty quantification. In this article, we ...
-
First-Passage Times for Random Partial Sums: Yadrenko’s Model for e and Beyond Am. Stat. (IF 1.8) Pub Date : 2023-08-10 Joel E. Cohen
M. I. Yadrenko discovered that the expectation of the minimum number N1 of independent and identically distributed uniform random variables on (0, 1) that have to be added to exceed 1 is e. For any...
-
Event History Analysis with R, 2nd ed. Am. Stat. (IF 1.8) Pub Date : 2023-07-31 Ding-Geng Chen
Published in The American Statistician (Vol. 77, No. 3, 2023)
-
Introducing Variational Inference in Statistics and Data Science Curriculum Am. Stat. (IF 1.8) Pub Date : 2023-07-24 Vojtech Kejzlar, Jingchen Hu
Probabilistic models such as logistic regression, Bayesian classification, neural networks, and models for natural language processing, are increasingly more present in both undergraduate and gradu...