-
BATCH POLICY LEARNING IN AVERAGE REWARD MARKOV DECISION PROCESSES. Ann. Stat. (IF 4.5) Pub Date : 2022-12-21 Peng Liao,Zhengling Qi,Runzhe Wan,Predrag Klasnja,Susan A Murphy
We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal
-
LINEAR BIOMARKER COMBINATION FOR CONSTRAINED CLASSIFICATION. Ann. Stat. (IF 4.5) Pub Date : 2022-10-27 Yijian Huang,Martin G Sanda
Multiple biomarkers are often combined to improve disease diagnosis. The uniformly optimal combination, i.e., with respect to all reasonable performance metrics, unfortunately requires excessive distributional modeling, to which the estimation can be sensitive. An alternative strategy is rather to pursue local optimality with respect to a specific performance metric. Nevertheless, existing methods
-
DOUBLY DEBIASED LASSO: HIGH-DIMENSIONAL INFERENCE UNDER HIDDEN CONFOUNDING. Ann. Stat. (IF 4.5) Pub Date : 2022-06-16 Zijian Guo,Domagoj Ćevid,Peter Bühlmann
Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the Doubly Debiased Lasso estimator for individual components of the regression coefficient vector. Our advocated method simultaneously
-
Testability of high-dimensional linear models with nonsparse structures Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Jelena Bradic,Jianqing Fan,Yinchu Zhu
-
Inference for low-rank tensors—no need to debias Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Dong Xia,Anru R. Zhang,Yuchen Zhou
-
Statistical inference for principal components of spiked covariance matrices Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Zhigang Bao,Xiucai Ding,Jingming Wang,Ke Wang
-
Multivariate ranks and quantiles using optimal transport: Consistency, rates and nonparametric testing Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Promit Ghosal,Bodhisattva Sen
-
Surprises in high-dimensional ridgeless least squares interpolation Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Trevor Hastie,Andrea Montanari,Saharon Rosset,Ryan J. Tibshirani
-
Optimal false discovery rate control for large scale multiple testing with auxiliary information Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Hongyuan Cao,Jun Chen,Xianyang Zhang
-
Distributed nonparametric function estimation: Optimal rate of convergence and cost of adaptation Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 T. Tony Cai,Hongji Wei
-
General and feasible tests with multiply-imputed datasets Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Kin Wai Chan
-
Necessary and sufficient conditions for asymptotically optimal linear prediction of random fields on compact metric spaces Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Kristin Kirchner,David Bolin
-
Refined Cramér-type moderate deviation theorems for general self-normalized sums with applications to dependent random variables and winsorized mean Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Lan Gao,Qi-Man Shao,Jiasheng Shi
-
All-in-one robust estimator of the Gaussian mean Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Arnak S. Dalalyan,Arshak Minasyan
-
Functional sufficient dimension reduction through average Fréchet derivatives Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Kuang-Yao Lee,Lexin Li
-
False discovery rate control with unknown null distribution: Is it possible to mimic the oracle? Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Etienne Roquain,Nicolas Verzelen
-
Iterative algorithm for discrete structure recovery Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Chao Gao,Anderson Y. Zhang
-
Parametric copula adjusted for non- and semiparametric regression Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Yue Zhao,Irène Gijbels,Ingrid Van Keilegom
-
Adaptive estimation in multivariate response regression with hidden variables Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Xin Bing,Yang Ning,Yaosheng Xu
-
Edgeworth expansions for network moments Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Yuan Zhang,Dong Xia
-
Motif estimation via subgraph sampling: The fourth-moment phenomenon Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Bhaswar B. Bhattacharya,Sayan Das,Sumit Mukherjee
-
Inference for change points in high-dimensional data via selfnormalization Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Runmin Wang,Changbo Zhu,Stanislav Volgushev,Xiaofeng Shao
-
Max-sum tests for cross-sectional independence of high-dimensional panel data Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Long Feng,Tiefeng Jiang,Binghui Liu,Wei Xiong
-
Sparse high-dimensional linear regression. Estimating squared error and a phase transition Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 David Gamarnik,Ilias Zadik
-
Reconciling design-based and model-based causal inferences for split-plot experiments Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Anqi Zhao,Peng Ding
-
Adaptive test of independence based on HSIC measures Ann. Stat. (IF 4.5) Pub Date : 2022-04-01 Mélisande Albert,Béatrice Laurent,Amandine Marrel,Anouar Meynaoui
-
Backfitting for large scale crossed random effects regressions Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Swarnadip Ghosh,Trevor Hastie,Art B. Owen
Regression models with crossed random effect error models can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling can easily grow as $N^{3/2}$ (or worse) for $N$ observations. Papaspiliopoulos et al. (2020) present a collapsed Gibbs sampler that costs $O(N)$, but under an extremely stringent sampling model. We propose a backfitting algorithm to compute a generalized
-
Robust sub-Gaussian estimation of a mean vector in nearly linear time Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Jules Depersin,Guillaume Lecué
We construct an algorithm, running in time $\tilde{\mathcal O}(N d + uK d)$, which is robust to outliers and heavy-tailed data and which achieves the subgaussian rate from [Lugosi, Mendelson] \begin{equation}\label{eq:intro_subgaus_rate} \sqrt{\frac{{\rm Tr}(\Sigma)}{N}}+\sqrt{\frac{||\Sigma||_{op}K}{N}} \end{equation}with probability at least $1-\exp(-c_0K)-\exp(-c_1 u)$ where $\Sigma$ is the covariance
-
Semiparametric latent-class models for multivariate longitudinal and survival data Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Kin Yau Wong,Donglin Zeng,D. Y. Lin
In long-term follow-up studies, data are often collected on repeated measures of multivariate response variables as well as on time to the occurrence of a certain event. To jointly analyze such longitudinal data and survival time, we propose a general class of semiparametric latent-class models that accommodates a heterogeneous study population with flexible dependence structures between the longitudinal
-
Canonical thresholding for nonsparse high-dimensional linear regression Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Igor Silin,Jianqing Fan
We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form
-
Powerful knockoffs via minimizing reconstructability Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Asher Spector,Lucas Janson
Model-X knockoffs allows analysts to perform feature selection using almost any machine learning algorithm while still provably controlling the expected proportion of false discoveries. To apply model-X knockoffs, one must construct synthetic variables, called knockoffs, which effectively act as controls during feature selection. The gold standard for constructing knockoffs has been to minimize the
-
Approximate Message Passing algorithms for rotationally invariant matrices Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Zhou Fan
Approximate Message Passing (AMP) algorithms have seen widespread use across a variety of applications. However, the precise forms for their Onsager corrections and state evolutions depend on properties of the underlying random matrix ensemble, limiting the extent to which AMP algorithms derived for white noise may be applicable to data matrices that arise in practice. In this work, we study more general
-
Tensor clustering with planted structures: Statistical optimality and computational limits Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Yuetian Luo,Anru R. Zhang
This paper studies the statistical and computational limits of high-order clustering with planted structures. We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC), and study the methods and theory for testing whether a cluster exists (detection) and identifying the support of cluster (recovery). Specifically, we identify the sharp boundaries
-
High-dimensional asymptotics of likelihood ratio tests in the Gaussian sequence model under convex constraints Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Qiyang Han,Bodhisattva Sen,Yandi Shen
In the Gaussian sequence model $Y=\mu+\xi$, we study the likelihood ratio test (LRT) for testing $H_0: \mu=\mu_0$ versus $H_1: \mu \in K$, where $\mu_0 \in K$, and $K$ is a closed convex set in $\mathbb{R}^n$. In particular, we show that under the null hypothesis, normal approximation holds for the log-likelihood ratio statistic for a general pair $(\mu_0,K)$, in the high dimensional regime where the
-
Admissible ways of merging p-values under arbitrary dependence Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Vladimir Vovk,Bin Wang,Ruodu Wang
Methods of merging several p-values into a single p-value are important in their own right and widely used in multiple hypothesis testing. This paper is the first to systematically study the admissibility (in Wald's sense) of p-merging functions and their domination structure, without any assumptions on the dependence structure of the input p-values. As a technical tool we use the notion of e-values
-
On least squares estimation under heteroscedastic and heavy-tailed errors Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Arun K. Kuchibhotla,Rohit K. Patra
We consider least squares estimation in a general nonparametric regression model. The rate of convergence of the least squares estimator (LSE) for the unknown regression function is well studied when the errors are sub-Gaussian. We find upper bounds on the rates of convergence of the LSE when the errors have uniformly bounded conditional variance and have only finitely many moments. We show that the
-
Fundamental barriers to high-dimensional regression with convex penalties Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Michael Celentano,Andrea Montanari
In high-dimensional regression, we attempt to estimate a parameter vector ${\boldsymbol \beta}_0\in{\mathbb R}^p$ from $n\lesssim p$ observations $\{(y_i,{\boldsymbol x}_i)\}_{i\le n}$ where ${\boldsymbol x}_i\in{\mathbb R}^p$ is a vector of predictors and $y_i$ is a response variable. A well-estabilished approach uses convex regularizers to promote specific structures (e.g. sparsity) of the estimate
-
Dimension reduction for functional data based on weak conditional moments Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Bing Li,Jun Song
We develop a general theory and estimation methods for functional linear sufficient dimension reduction, where both the predictor and the response can be random functions, or even vectors of functions. Unlike the existing dimension reduction methods, our approach does not rely on the estimation of conditional mean and conditional variance. Instead, it is based on a new statistical construction — the
-
An optimal statistical and computational framework for generalized tensor estimation Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Rungang Han,Rebecca Willett,Anru R. Zhang
This paper describes a flexible framework for generalized low-rank tensor estimation problems that includes many important instances arising from applications in computational imaging, genomics, and network analysis. The proposed estimator consists of finding a low-rank tensor fit to the data under generalized parametric models. To overcome the difficulty of non-convexity in these problems, we introduce
-
Spatial dependence and space–time trend in extreme events Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 John H. J. Einmahl,Ana Ferreira,Laurens de Haan,Cláudia Neves,Chen Zhou
The statistical theory of extremes is extended to observations that are non-stationary and not independent. The non-stationarity over time and space is controlled via the scedasis (tail scale) in the marginal distributions. Spatial dependence stems from multivariate extreme value theory. We establish asymptotic theory for both the weighted sequential tail empirical process and the weighted tail quantile
-
On an extension of the promotion time cure model Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Jad Beyhum,Anouar El Ghouch,François Portier,Ingrid Van Keilegom
We consider the problem of estimating the distribution of time-to-event data that are subject to censoring and for which the event of interest might never occur, i.e., some subjects are cured. To model this kind of data in the presence of covariates, one of the leading semiparametric models is the promotion time cure model \citep{yakovlev1996}, which adapts the Cox model to the presence of cured subjects
-
Minimax nonparametric estimation of pure quantum states Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Samriddha Lahiry,Michael Nussbaum
-
Isotonic regression with unknown permutations: Statistics, computation and adaptation Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Ashwin Pananjady,Richard J. Samworth
Motivated by models for multiway comparison data, we consider the problem of estimating a coordinate-wise isotonic function on the domain $[0, 1]^d$ from noisy observations collected on a uniform lattice, but where the design points have been permuted along each dimension. While the univariate and bivariate versions of this problem have received significant attention, our focus is on the multivariate
-
Deconvolution with unknown noise distribution is possible for multivariate signals Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Élisabeth Gassiat,Sylvain Le Corff,Luc Lehéricy
This paper considers the deconvolution problem in the case where the target signal is multidimensional and no information is known about the noise distribution. More precisely, no assumption is made on the noise distribution and no samples are available to estimate it: the deconvolution problem is solved based only on the corrupted signal observations. We establish the identifiability of the model
-
Minimax optimality of permutation tests Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Ilmun Kim,Sivaraman Balakrishnan,Larry Wasserman
Permutation tests are widely used in statistics, providing a finite-sample guarantee on the type I error rate whenever the distribution of the samples under the null hypothesis is invariant to some rearrangement. Despite its increasing popularity and empirical success, theoretical properties of the permutation test, especially its power, have not been fully explored beyond simple cases. In this paper
-
Testing community structure for hypergraphs Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Mingao Yuan,Ruiqi Liu,Yang Feng,Zuofeng Shang
-
Pattern graphs: A graphical approach to nonmonotone missing data Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Yen-Chi Chen
We introduce the concept of pattern graphs--directed acyclic graphs representing how response patterns are associated. A pattern graph represents an identifying restriction that is nonparametrically identified/saturated and is often a missing not at random restriction. We introduce a selection model and a pattern mixture model formulations using the pattern graphs and show that they are equivalent
-
On minimax optimality of sparse Bayes predictive density estimates Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Gourab Mukherjee,Iain M. Johnstone
We study predictive density estimation under Kullback-Leibler loss in $\ell_0$-sparse Gaussian sequence models. We propose proper Bayes predictive density estimates and establish asymptotic minimaxity in sparse models. A surprise is the existence of a phase transition in the future-to-past variance ratio $r$. For $r < r_0 = (\surd 5 - 1)/4$, the natural discrete prior ceases to be asymptotically optimal
-
Heteroskedastic PCA: Algorithm, optimality, and applications Ann. Stat. (IF 4.5) Pub Date : 2022-02-01 Anru R. Zhang,T. Tony Cai,Yihong Wu
Principal component analysis (PCA) and singular value decomposition (SVD) are widely used in statistics, machine learning, and applied mathematics. It has been well studied in the case of homoskedastic noise, where the noise levels of the contamination are homogeneous. In this paper, we consider PCA and SVD in the presence of heteroskedastic noise, which arises naturally in a range of applications
-
Optimal adaptivity of signed-polygon statistics for network testing Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Jiashun Jin,Zheng Tracy Ke,Shengming Luo
Given a symmetric social network, we are interested in testing whether it has only one community or multiple communities. The desired tests should (a) accommodate severe degree heterogeneity, (b) accommodate mixed-memberships, (c) have a tractable null distribution, and (d) adapt automatically to different levels of sparsity, and achieve the optimal phase diagram. How to find such a test is a challenging
-
Extreme conditional expectile estimation in heavy-tailed heteroscedastic regression models Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Stéphane Girard,Gilles Stupfler,Antoine Usseglio-Carleve
Expectiles define a least squares analogue of quantiles. They have been the focus of a substantial quantity of research in the context of actuarial and financial risk assessment over the last decade. The behaviour and estimation of unconditional extreme expectiles using independent and identically distributed heavy-tailed observations has been investigated in a recent series of papers. We build here
-
Asymptotic properties of penalized spline estimators in concave extended linear models: Rates of convergence Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Jianhua Z. Huang,Ya Su
This paper develops a general theory on rates of convergence of penalized spline estimators for function estimation when the likelihood functional is concave in candidate functions, where the likelihood is interpreted in a broad sense that includes conditional likelihood, quasi-likelihood, and pseudo-likelihood. The theory allows all feasible combinations of the spline degree, the penalty order, and
-
Community detection on mixture multilayer networks via regularized tensor decomposition Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Bing-Yi Jing,Ting Li,Zhongyuan Lyu,Dong Xia
We study the problem of community detection in multi-layer networks, where pairs of nodes can be related in multiple modalities. We introduce a general framework, i.e., mixture multi-layer stochastic block model (MMSBM), which includes many earlier models as special cases. We propose a tensor-based algorithm (TWIST) to reveal both global/local memberships of nodes, and memberships of layers. We show
-
A simple measure of conditional dependence Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Mona Azadkia,Sourav Chatterjee
We propose a coefficient of conditional dependence between two random variables $Y$ and $Z$ given a set of other variables $X_1,\ldots,X_p$, based on an i.i.d. sample. The coefficient has a long list of desirable properties, the most important of which is that under absolutely no distributional assumptions, it converges to a limit in $[0,1]$, where the limit is $0$ if and only if $Y$ and $Z$ are conditionally
-
Adaptive learning rates for support vector machines working on data with low intrinsic dimension Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Thomas Hamm,Ingo Steinwart
We derive improved regression and classification rates for support vector machines using Gaussian kernels under the assumption that the data has some low-dimensional intrinsic structure that is described by the box-counting dimension. Under some standard regularity assumptions for regression and classification we prove learning rates, in which the dimension of the ambient space is replaced by the box-counting
-
Online inference with multi-modal likelihood functions Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Mathieu Gerber,Kari Heine
Let $(Y_t)_{t\geq 1}$ be a sequence of i.i.d.\ observations and $\{f_\theta,\theta\in \mathbb{R}^d\}$ be a parametric model. We introduce a new online algorithm for computing a sequence $(\hat{\theta}_t)_{t\geq 1}$ which is shown to converge almost surely to $\text{argmax}_{\theta\in \mathbb{R}^d}\mathbb{E}[\log f_\theta(Y_1)]$ at rate $ \mathcal{O}(\log (t)^{(1+\varepsilon)/2}t^{-1/2})$, with $\varepsilon>0$
-
Total variation regularized Fréchet regression for metric-space valued data Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Zhenhua Lin,Hans-Georg Müller
-
Analysis of generalized Bregman surrogate algorithms for nonsmooth nonconvex statistical learning Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Yiyuan She,Zhifeng Wang,Jiuwu Jin
Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror descent, iterative thresholding, DC programming and many others as particular instances. The re-characterization via generalized Bregman functions enables us to construct
-
An asymptotic test for constancy of the variance under short-range dependence Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Sara K. Schmidt,Max Wornowizki,Roland Fried,Herold Dehling
We present a novel approach to test for heteroscedasticity of a non-stationary time series that is based on Gini's mean difference of logarithmic local sample variances. In order to analyse the large sample behaviour of our test statistic, we establish new limit theorems for U-statistics of dependent triangular arrays. We derive the asymptotic distribution of the test statistic under the null hypothesis
-
Wilks’ theorem for semiparametric regressions with weakly dependent data Ann. Stat. (IF 4.5) Pub Date : 2021-12-01 Marie du Roy de Chaumaray,Matthieu Marbac,Valentin Patilea
The empirical likelihood inference is extended to a class of semiparametric models for stationary, weakly dependent series. A partially linear single-index regression is used for the conditional mean of the series given its past, and the present and past values of a vector of covariates. A parametric model for the conditional variance of the series is added to capture further nonlinear effects. We