• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Hanwen Huang

Mean square error (MSE) of the estimator can be used to evaluate the performance of a regression model. In this paper, we derive the asymptotic MSE of $l_{1}$-penalized robust estimators in the limit of both sample size $n$ and dimension $p$ going to infinity with fixed ratio $n/p\rightarrow \delta$. We focus on the $l_{1}$-penalized least absolute deviation and $l_{1}$-penalized Huber’s regressions

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Anthony Lee; Sumeetpal S. Singh; Matti Vihola

The conditional particle filter (CPF) is a promising algorithm for general hidden Markov model smoothing. Empirical evidence suggests that the variant of CPF with backward sampling (CBPF) performs well even with long time series. Previous theoretical results have not been able to demonstrate the improvement brought by backward sampling, whereas we provide rates showing that CBPF can remain effective

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Olivier Ledoit; Michael Wolf

This paper establishes the first analytical formula for nonlinear shrinkage estimation of large-dimensional covariance matrices. We achieve this by identifying and mathematically exploiting a deep connection between nonlinear shrinkage and nonparametric estimation of the Hilbert transform of the sample spectral density. Previous nonlinear shrinkage methods were of numerical nature: QuEST requires numerical

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Dongming Huang; Lucas Janson

The recent paper Candès et al. (J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 (2018) 551–577) introduced model-X knockoffs, a method for variable selection that provably and nonasymptotically controls the false discovery rate with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Kolyan Ray; Aad van der Vaart

We develop a semiparametric Bayesian approach for estimating the mean response in a missing data model with binary outcomes and a nonparametrically modelled propensity score. Equivalently, we estimate the causal effect of a treatment, correcting nonparametrically for confounding. We show that standard Gaussian process priors satisfy a semiparametric Bernstein–von Mises theorem under smoothness conditions

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Davy Paindaveine; Thomas Verdebout

Motivated by the fact that circular or spherical data are often much concentrated around a location $\pmb{\theta }$, we consider inference about $\pmb{\theta }$ under high concentration asymptotic scenarios for which the probability of any fixed spherical cap centered at $\pmb{\theta }$ converges to one as the sample size $n$ diverges to infinity. Rather than restricting to Fisher–von Mises–Langevin

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Arun K. Kuchibhotla; Lawrence D. Brown; Andreas Buja; Junhui Cai; Edward I. George; Linda H. Zhao

Modern data-driven approaches to modeling make extensive use of covariate/model selection. Such selection incurs a cost: it invalidates classical statistical inference. A conservative remedy to the problem was proposed by Berk et al. (Ann. Statist. 41 (2013) 802–837) and further extended by Bachoc, Preinerstorfer and Steinberger (2016). These proposals, labeled “PoSI methods,” provide valid inference

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Emilia Pompe; Chris Holmes; Krzysztof Łatuszyński

We propose a new Monte Carlo method for sampling from multimodal distributions. The idea of this technique is based on splitting the task into two: finding the modes of a target distribution $\pi$ and sampling, given the knowledge of the locations of the modes. The sampling algorithm relies on steps of two types: local ones, preserving the mode; and jumps to regions associated with different modes

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Jeremy Heng; Adrian N. Bishop; George Deligiannidis; Arnaud Doucet

Sequential Monte Carlo methods, also known as particle methods, are a popular set of techniques for approximating high-dimensional probability distributions and their normalizing constants. These methods have found numerous applications in statistics and related fields; for example, for inference in nonlinear non-Gaussian state space models, and in complex static models. Like many Monte Carlo sampling

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Bhaswar B. Bhattacharya

In this paper, we consider the problem of testing the equality of two multivariate distributions based on geometric graphs constructed using the interpoint distances between the observations. These include the tests based on the minimum spanning tree and the $K$-nearest neighbor (NN) graphs, among others. These tests are asymptotically distribution-free, universally consistent and computationally efficient

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Chao Gao; Aad W. van der Vaart; Harrison H. Zhou

High dimensional statistics deals with the challenge of extracting structured information from complex model settings. Compared with a large number of frequentist methodologies, there are rather few theoretically optimal Bayes methods for high dimensional models. This paper provides a unified approach to both Bayes high dimensional statistics and Bayes nonparametrics in a general framework of structured

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Edgar Dobriban

Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how many components affect the data? This is an important problem, because decisions made here have a large impact on all downstream data analysis. Consequently, many approaches

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Shuaiwen Wang; Haolei Weng; Arian Maleki

We study the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations $n$ grows at the same rate as the number of predictors $p$. We consider two-stage variable selection techniques (TVS) in which the first stage uses bridge estimators to obtain an estimate of the regression coefficients, and the second stage simply thresholds this

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Johannes O. Royset; Roger J-B Wets

We propose a unified framework for establishing existence of nonparametric $M$-estimators, computing the corresponding estimates, and proving their strong consistency when the class of functions is exceptionally rich. In particular, the framework addresses situations where the class of functions is complex involving information and assumptions about shape, pointwise bounds, location of modes, height

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Runmin Wang; Xiaofeng Shao

Self-normalization has attracted considerable attention in the recent literature of time series analysis, but its scope of applicability has been limited to low-/fixed-dimensional parameters for low-dimensional time series. In this article, we propose a new formulation of self-normalization for inference about the mean of high-dimensional stationary processes. Our original test statistic is a U-statistic

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Sophie Donnet; Vincent Rivoirard; Judith Rousseau

This paper studies nonparametric estimation of parameters of multivariate Hawkes processes. We consider the Bayesian setting and derive posterior concentration rates. First, rates are derived for $\mathbb{L}_{1}$-metrics for stochastic intensities of the Hawkes process. We then deduce rates for the $\mathbb{L}_{1}$-norm of interactions functions of the process. Our results are exemplified by using

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Jeong Min Jeon; Byeong U. Park

This paper develops a foundation of methodology and theory for the estimation of structured nonparametric regression models with Hilbertian responses. Our method and theory are focused on the additive model, while the main ideas may be adapted to other structured models. For this, the notion of Bochner integration is introduced for Banach-space-valued maps as a generalization of Lebesgue integration

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Angelika Rohde; Lukas Steinberger

We study the problem of estimating a functional $\theta ({\mathbb{P}})$ of an unknown probability distribution ${\mathbb{P}}\in {\mathcal{P}}$ in which the original iid sample $X_{1},\dots ,X_{n}$ is kept private even from the statistician via an $\alpha$-local differential privacy constraint. Let $\omega _{\mathrm{TV}}$ denote the modulus of continuity of the functional $\theta$ over ${\mathcal{P}}$

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Ethan X. Fang; Yang Ning; Runze Li

This paper concerns statistical inference for longitudinal data with ultrahigh dimensional covariates. We first study the problem of constructing confidence intervals and hypothesis tests for a low-dimensional parameter of interest. The major challenge is how to construct a powerful test statistic in the presence of high-dimensional nuisance parameters and sophisticated within-subject correlation of

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Michael Fauß; Abdelhak M. Zoubir; H. Vincent Poor

Under mild Markov assumptions, sufficient conditions for strict minimax optimality of sequential tests for multiple hypotheses under distributional uncertainty are derived. First, the design of optimal sequential tests for simple hypotheses is revisited, and it is shown that the partial derivatives of the corresponding cost function are closely related to the performance metrics of the underlying sequential

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Anderson Y. Zhang; Harrison H. Zhou

The mean field variational Bayes method is becoming increasingly popular in statistics and machine learning. Its iterative coordinate ascent variational inference algorithm has been widely applied to large scale Bayesian inference. See Blei et al. (2017) for a recent comprehensive review. Despite the popularity of the mean field method, there exist remarkably little fundamental theoretical justifications

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Ismaël Castillo; Étienne Roquain

This paper explores a connection between empirical Bayes posterior distributions and false discovery rate (FDR) control. In the Gaussian sequence model this work shows that empirical Bayes-calibrated spike and slab posterior distributions allow a correct FDR control under sparsity. Doing so, it offers a frequentist theoretical validation of empirical Bayes methods in the context of multiple testing

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-09-19
Alexander Aue; Anne van Delft

Interest in functional time series has spiked in the recent past with papers covering both methodology and applications being published at a much increased pace. This article contributes to the research in this area by proposing a new stationarity test for functional time series based on frequency domain methods. The proposed test statistics is based on joint dimension reduction via functional principal

更新日期：2020-11-18
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Ethan Anderes; Jesper Møller; Jakob G. Rasmussen

We develop parametric classes of covariance functions on linear networks and their extension to graphs with Euclidean edges, that is, graphs with edges viewed as line segments or more general sets with a coordinate system allowing us to consider points on the graph which are vertices or points on an edge. Our covariance functions are defined on the vertices and edge points of these graphs and are isotropic

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Xiucai Ding; Zhou Zhou

We consider the estimation of and inference on precision matrices of a rich class of univariate locally stationary linear and nonlinear time series, assuming that only one realization of the time series is observed. Using a Cholesky decomposition technique, we show that the precision matrices can be directly estimated via a series of least squares linear regressions with smoothly time-varying coefficients

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Yu Liu; Zhao Ren

The last decade has witnessed significant methodological and theoretical advances in estimating large precision matrices. In particular, there are scientific applications such as longitudinal data, meteorology and spectroscopy in which the ordering of the variables can be interpreted through a bandable structure on the Cholesky factor of the precision matrix. However, the minimax theory has still been

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Marco Meyer; Efstathios Paparoditis; Jens-Peter Kreiss

Existing frequency domain methods for bootstrapping time series have a limited range. Essentially, these procedures cover the case of linear time series with independent innovations, and some even require the time series to be Gaussian. In this paper we propose a new frequency domain bootstrap method—the hybrid periodogram bootstrap (HPB)—which is consistent for a much wider range of stationary, even

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
James O. Berger; Dongchu Sun; Chengyuan Song

Bayesian analysis for the covariance matrix of a multivariate normal distribution has received a lot of attention in the last two decades. In this paper, we propose a new class of priors for the covariance matrix, including both inverse Wishart and reference priors as special cases. The main motivation for the new class is to have available priors—both subjective and objective—that do not “force eigenvalues

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Botond Szabó; Harry van Zanten

We study estimation methods under communication constraints in a distributed version of the nonparametric random design regression model. We derive minimax lower bounds and exhibit methods that attain those bounds. Moreover, we show that adaptive estimation is possible in this setting.

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Valentin Patilea; Ingrid Van Keilegom

In survival analysis it often happens that some subjects under study do not experience the event of interest; they are considered to be “cured.” The population is thus a mixture of two subpopulations, one of cured subjects and one of “susceptible” subjects. We propose a novel approach to estimate a mixture cure model when covariates are present and the lifetime is subject to random right censoring

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Pramita Bagchi; Holger Dette

The assumption of separability is a simplifying and very popular assumption in the analysis of spatiotemporal or hypersurface data structures. It is often made in situations where the covariance structure cannot be easily estimated, for example, because of a small sample size or because of computational storage problems. In this paper we propose a new and very simple test to validate this assumption

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Bryon Aragam; Chen Dan; Eric P. Xing; Pradeep Ravikumar

Motivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable by introducing a novel framework involving clustering overfitted parametric (i.e., misspecified) mixture models. These identifiability conditions generalize existing conditions in the literature and are flexible enough to include, for example, mixtures of infinite

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14

Introduced by Breiman (Mach. Learn. 45 (2001) 5–32), Random Forests are widely used classification and regression algorithms. While being initially designed as batch algorithms, several variants have been proposed to handle online learning. One particular instance of such forests is the Mondrian forest (In Adv. Neural Inf. Process. Syst. (2014) 3140–3148; In Proceedings of the 19th International Conference

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Thomas Porter; Michael Stewart

Higher criticism (HC) is a popular method for large-scale inference problems based on identifying unusually high proportions of small $p$-values. It has been shown to enjoy a lower-order optimality property in a simple normal location mixture model which is shared by the ‘tailor-made’ parametric generalised likelihood ratio test (GLRT) for the same model; however, HC has also been shown to perform

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Debarghya Ghoshdastidar; Maurilio Gutzeit; Alexandra Carpentier; Ulrike von Luxburg

The study of networks leads to a wide range of high-dimensional inference problems. In many practical applications, one needs to draw inference from one or few large sparse networks. The present paper studies hypothesis testing of graphs in this high-dimensional regime, where the goal is to test between two populations of inhomogeneous random graphs defined on the same set of $n$ vertices. The size

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Fengshuo Zhang; Chao Gao

We study convergence rates of variational posterior distributions for nonparametric and high-dimensional inference. We formulate general conditions on prior, likelihood and variational class that characterize the convergence rates. Under similar “prior mass and testing” conditions considered in the literature, the rate is found to be the sum of two terms. The first term stands for the convergence rate

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Judith Rousseau; Botond Szabo

We investigate the frequentist coverage properties of (certain) Bayesian credible sets in a general, adaptive, nonparametric framework. It is well known that the construction of adaptive and honest confidence sets is not possible in general. To overcome this problem (in context of sieve type of priors), we introduce an extra assumption on the functional parameters, the so-called “general polished tail”

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Shanshan Ding; Wei Qian; Lan Wang

This paper provides a unified framework and an efficient algorithm for analyzing high-dimensional survival data under weak modeling assumptions. In particular, it imposes neither parametric distributional assumption nor linear regression assumption. It only assumes that the survival time $T$ depends on a high-dimensional covariate vector $\mathbf{X}$ through low-dimensional linear combinations of covariates

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Veronika Ročková; Stéphanie van der Pas

Since their inception in the 1980s, regression trees have been one of the more widely used nonparametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet susceptible to overfitting. Strategies against overfitting have traditionally relied on pruning greedily

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Yuqi Gu; Gongjun Xu

Latent class models have wide applications in social and biological sciences. In many applications, prespecified restrictions are imposed on the parameter space of latent class models, through a design matrix, to reflect practitioners’ assumptions about how the observed responses depend on subjects’ latent traits. Though widely used in various fields, such restricted latent class models suffer from

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Xin Bing; Florentina Bunea; Yang Ning; Marten Wegkamp

This work introduces a novel estimation method, called LOVE, of the entries and structure of a loading matrix $A$ in a latent factor model $X=AZ+E$, for an observable random vector $X\in \mathbb{R}^{p}$, with correlated unobservable factors $Z\in \mathbb{R}^{K}$, with $K$ unknown, and uncorrelated noise $E$. Each row of $A$ is scaled, and allowed to be sparse. In order to identify the loading matrix

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Kevin McGoff; Andrew B. Nobel

A dynamical model consists of a continuous self-map $T:\mathcal{X}\to \mathcal{X}$ of a compact state space $\mathcal{X}$ and a continuous observation function $f:\mathcal{X}\to \mathbb{R}$. This paper considers the fitting of a parametrized family of dynamical models to an observed real-valued stochastic process using empirical risk minimization. The limiting behavior of the minimum risk parameters

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Edward H. Kennedy; Sivaraman Balakrishnan; Max G’Sell

It is well known that, without restricting treatment effect heterogeneity, instrumental variable (IV) methods only identify “local” effects among compliers, that is, those subjects who take treatment only when encouraged by the IV. Local effects are controversial since they seem to only apply to an unidentified subgroup; this has led many to denounce these effects as having little policy relevance

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Yihong Wu; Pengkun Yang

The method of moments (Philos. Trans. R. Soc. Lond. Ser. A 185 (1894) 71–110) is one of the most widely used methods in statistics for parameter estimation, by means of solving the system of equations that match the population and estimated moments. However, in practice and especially for the important case of mixture models, one frequently needs to contend with the difficulties of non-existence or

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Ching-Kang Ing

We investigate the prediction capability of the orthogonal greedy algorithm (OGA) in high-dimensional regression models with dependent observations. The rates of convergence of the prediction error of OGA are obtained under a variety of sparsity conditions. To prevent OGA from overfitting, we introduce a high-dimensional Akaike’s information criterion (HDAIC) to determine the number of OGA iterations

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
John E. Kolassa; Todd A. Kuffner

We consider a fundamental open problem in parametric Bayesian theory, namely the validity of the formal Edgeworth expansion of the posterior density. While the study of valid asymptotic expansions for posterior distributions constitutes a rich literature, the validity of the formal Edgeworth expansion has not been rigorously established. Several authors have claimed connections of various posterior

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Mayya Zhilova

We study accuracy of bootstrap procedures for estimation of quantiles of a smooth function of a sum of independent sub-Gaussian random vectors. We establish higher-order approximation bounds with error terms depending on a sample size and a dimension explicitly. These results lead to improvements of accuracy of a weighted bootstrap procedure for general log-likelihood ratio statistics. The key element

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Johannes Schmidt-Hieber

Johannes Schmidt-Hieber. Source: Annals of Statistics, Volume 48, Number 4, 1916--1921.

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14

Ohad Shamir. Source: Annals of Statistics, Volume 48, Number 4, 1911--1915.

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Michael Kohler; Sophie Langer

Michael Kohler, Sophie Langer. Source: Annals of Statistics, Volume 48, Number 4, 1906--1910.

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Gitta Kutyniok

I would like to congratulate Johannes Schmidt–Hieber on a very interesting paper in which he considers regression functions belonging to the class of so-called compositional functions and analyzes the ability of estimators based on the multivariate nonparametric regression model of deep neural networks to achieve minimax rates of convergence. In my discussion, I will first regard such a type of result

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Behrooz Ghorbani; Song Mei; Theodor Misiakiewicz; Andrea Montanari

Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Andrea Montanari. Source: Annals of Statistics, Volume 48, Number 4, 1898--1901.

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-08-14
Johannes Schmidt-Hieber

Consider the multivariate nonparametric regression model. It is shown that estimators based on sparsely connected deep neural networks with ReLU activation function and properly chosen network architecture achieve the minimax rates of convergence (up to $\log n$-factors) under a general composition assumption on the regression function. The framework includes many well-studied structural constraints

更新日期：2020-08-14
• Ann. Stat. (IF 2.65) Pub Date : 2020-07-17
Francesco Giordano; Soumendra Nath Lahiri; Maria Lucia Parrella

We consider nonparametric regression in high dimensions where only a relatively small subset of a large number of variables are relevant and may have nonlinear effects on the response. We develop methods for variable selection, structure discovery and estimation of the true low-dimensional regression function, allowing any degree of interactions among the relevant variables that need not be specified

更新日期：2020-07-17
• Ann. Stat. (IF 2.65) Pub Date : 2020-07-17
Haoran Li; Alexander Aue; Debashis Paul; Jie Peng; Pei Wang

We propose a two-sample test for detecting the difference between mean vectors in a high-dimensional regime based on a ridge-regularized Hotelling’s $T^{2}$. To choose the regularization parameter, a method is derived that aims at maximizing power within a class of local alternatives. We also propose a composite test that combines the optimal tests corresponding to a specific collection of local alternatives

更新日期：2020-07-17
• Ann. Stat. (IF 2.65) Pub Date : 2020-07-17
Timothy I. Cannings; Thomas B. Berrett; Richard J. Samworth

We derive a new asymptotic expansion for the global excess risk of a local-$k$-nearest neighbour classifier, where the choice of $k$ may depend upon the test point. This expansion elucidates conditions under which the dominant contribution to the excess risk comes from the decision boundary of the optimal Bayes classifier, but we also show that if these conditions are not satisfied, then the dominant

更新日期：2020-07-17
• Ann. Stat. (IF 2.65) Pub Date : 2020-07-17
Lei Han; Kean Ming Tan; Ting Yang; Tong Zhang

A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be handled by available computational resources. We propose a general subsampling scheme for large-scale multiclass logistic regression and examine the variance of the resulting

更新日期：2020-07-17
• Ann. Stat. (IF 2.65) Pub Date : 2020-07-17
Qingyuan Zhao; Jingshu Wang; Gibran Hemani; Jack Bowden; Dylan S. Small

Mendelian randomization (MR) is a method of exploiting genetic variation to unbiasedly estimate a causal effect in presence of unmeasured confounding. MR is being widely used in epidemiology and other related areas of population science. In this paper, we study statistical inference in the increasingly popular two-sample summary-data MR design. We show a linear model for the observed associations approximately

更新日期：2020-07-17
• Ann. Stat. (IF 2.65) Pub Date : 2020-07-17
Matias D. Cattaneo; Max H. Farrell; Yingjie Feng

We present large sample results for partitioning-based least squares nonparametric regression, a popular method for approximating conditional expectation functions in statistics, econometrics and machine learning. First, we obtain a general characterization of their leading asymptotic bias. Second, we establish integrated mean squared error approximations for the point estimator and propose feasible

更新日期：2020-07-17
• Ann. Stat. (IF 2.65) Pub Date : 2020-07-17
Alois Kneip; Dominik Liebl

We propose a new reconstruction operator that aims to recover the missing parts of a function given the observed parts. This new operator belongs to a new, very large class of functional operators which includes the classical regression operators as a special case. We show the optimality of our reconstruction operator and demonstrate that the usually considered regression operators generally cannot

更新日期：2020-07-17
Contents have been reproduced by permission of the publishers.

down
wechat
bug