
显示样式: 排序: IF: - GO 导出
-
Fast computation of latent correlations J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-02-16 Grace Yoon; Christian L. Müller; Irina Gaynanova
Abstract Latent Gaussian copula models provide a powerful means to perform multi-view data integration since these models can seamlessly express dependencies between mixed variable types (binary, continuous, zero-inflated) via latent Gaussian correlations. The estimation of these latent correlations, however, comes at considerable computational cost, having prevented the routine use of these models
-
Multi-resolution filters for massive spatio-temporal data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-02-10 Marcin Jurek; Matthias Katzfuss
Abstract Spatio-temporal datasets are rapidly growing in size. For example, environmental variables are measured with increasing resolution by increasing numbers of automated sensors mounted on satellites and aircraft. Using such data, which are typically noisy and incomplete, the goal is to obtain complete maps of the spatio-temporal process, together with uncertainty quantification. We focus here
-
Model-based microbiome data ordination: A variational approximation approach J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-02-02 Yanyan Zeng; Hongyu Zhao; Tao Wang
Abstract The coevolution between human and bacteria colonizing the human body has profound implications for heath and development, with a growing body of evidence linking the altered microbiome composition with a wide array of disease states. Yet dimension reduction and visualization analysis of microbiome data are still in their infancy and many challenges exist. In this paper we introduce a general
-
Assessment and adjustment of approximate inference algorithms using the law of total variance J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-31 Xuejun Yu; David J. Nott; Minh-Ngoc Tran; Nadja Klein
Abstract A common method for assessing validity of Bayesian sampling or approximate inference methods makes use of simulated data replicates for parameters drawn from the prior. Under continuity assumptions, quantiles of functions of the simulated parameter values for corresponding posterior distributions are uniformly distributed. Checking for uniformity when a posterior density is approximated numerically
-
Model-based microbiome data ordination: A variational approximation approach J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-02-02 Yanyan Zeng; Hongyu Zhao; Tao Wang
Abstract The coevolution between human and bacteria colonizing the human body has profound implications for heath and development, with a growing body of evidence linking the altered microbiome composition with a wide array of disease states. Yet dimension reduction and visualization analysis of microbiome data are still in their infancy and many challenges exist. In this paper we introduce a general
-
False Discovery Rates to Detect Signals from Incomplete Spatially Aggregated Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-25 Hsin-Cheng Huang; Noel Cressie; Andrew Zammit-Mangion; Guowen Huang
Abstract There are a number of ways to test for the absence/presence of a spatial signal in a completely observed fine-resolution image. One of these is a powerful nonparametric procedure called Enhanced False Discovery Rate (EFDR). A drawback of EFDR is that it requires the data to be defined on regular pixels in a rectangular spatial domain. Here, we develop an EFDR procedure for possibly incomplete
-
Robust Approximate Bayesian Inference with Synthetic Likelihood J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-21 David T. Frazier; Christopher Drovandi
Abstract Bayesian synthetic likelihood (BSL) is now an established method for conducting approximate Bayesian inference in models where, due to the intractability of the likelihood function, exact Bayesian approaches are either infeasible or computationally too demanding. Implicit in the application of BSL is the assumption that the data generating process (DGP) can produce simulated summary statistics
-
Estimating Multiple Precision Matrices with Cluster Fusion Regularization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-19 Bradley S. Price; Aaron J. Molstad; Ben Sherwood
Abstract We propose a penalized likelihood framework for estimating multiple precision matrices from different classes. Most existing methods either incorporate no information on relationships between the precision matrices, or require this information be known a priori. The framework proposed in this article allows for simultaneous estimation of the precision matrices and relationships between the
-
Sequential Learning of Active Subspaces J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-19 Nathan Wycoff; Mickaël Binois; Stefan M. Wild
Abstract In recent years, active subspace methods (ASMs) have become a popular means of performing subspace sensitivity analysis on black-box functions. Naively applied, however, ASMs require gradient evaluations of the target function. In the event of noisy, expensive, or stochastic simulators, evaluating gradients via finite differencing may be infeasible. In such cases, often a surrogate model is
-
Sampling based estimation of in-degree distribution for directed complex networks J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-19 Nelson Antunes; Shankar Bhamidi; Tianjian Guo; Vladas Pipiras; Bang Wang
Abstract The focus of this work is on estimation of the in-degree distribution in directed networks from sampling network nodes or edges. A number of sampling schemes are considered, including random sampling with and without replacement, and several approaches based on random walks with possible jumps. When sampling nodes, it is assumed that only the out-edges of that node are visible, that is, the
-
Maximum Likelihood Estimation and Graph Matching in Errorfully Observed Networks J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-19 Jesús Arroyo; Daniel L. Sussman; Carey E. Priebe; Vince Lyzinski
Abstract Given a pair of graphs with the same number of vertices, the inexact graph matching problem consists in finding a correspondence between the vertices of these graphs that minimizes the total number of induced edge disagreements. We study this problem from a statistical framework in which one of the graphs is an errorfully observed copy of the other. We introduce a corrupting channel model
-
Predictive Distribution Modelling Using Transformation Forests J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-19 Torsten Hothorn; Achim Zeileis
Abstract Regression models for supervised learning problems with a continuous response are commonly understood as models for the conditional mean of the response given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding
-
A Pseudo-Likelihood Approach to Linear Regression with Partially Shuffled Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-19 Martin Slawski; Guoqing Diao; Emanuel Ben-David
Abstract Recently, there has been significant interest in linear regression in the situation where predictors and responses are not observed in matching pairs corresponding to the same statistical unit as a consequence of separate data collection and uncertainty in data integration. Mismatched pairs can considerably impact the model fit and disrupt the estimation of regression parameters. In this paper
-
Online Updating of Survival Analysis J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-14 Jing Wu; Ming-Hui Chen; Elizabeth D. Schifano; Jun Yan
Abstract When large amounts of survival data arrive in streams, conventional estimation methods become computationally infeasible since they require access to all observations at each accumulation point. We develop online updating methods for carrying out survival analysis under the Cox proportional hazards model in an online-update framework. Our methods are also applicable with time-dependent covariates
-
A Projection Pursuit Forest Algorithm for Supervised Classification J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-14 Natalia da Silva; Dianne Cook; Eun-Kyung Lee
Abstract This paper presents a new ensemble learning method for classification problems called projection pursuit random forest (PPF). PPF uses the PPtree algorithm introduced in Lee et al. (2013). In PPF, trees are constructed by splitting on linear combinations of randomly chosen variables. Projection pursuit is used to choose a projection of the variables that best separates the classes. Utilizing
-
Cluster Optimized Proximity Scaling J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-14 Thomas Rusch; Patrick Mair; Kurt Hornik
Abstract Proximity scaling methods such as Multidimensional Scaling (MDS) represent objects in a low dimensional configuration so that fitted object distances optimally approximate object proximities. Besides finding the optimal configuration, an additional goal may be to make statements about the cluster arrangement of objects. This fails if the configuration lacks appreciable clusteredness. We present
-
Model selection with Lasso-Zero: adding straw to the haystack to better find needles J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-14 Pascaline Descloux; Sylvain Sardy
Abstract The high-dimensional linear model y = X β 0 + ϵ is considered and the focus is put on the problem of recovering the support S 0 of the sparse vector β 0 . We introduce a new ℓ 1 -based estimator, called Lasso-Zero, whose novelty resides in the repeated use of noise dictionaries concatenated to X for overfitting the response. Lasso-Zero is an extension of Thresholded Basis Pursuit, for which
-
Distributed Bayesian Inference in Linear Mixed-Effects Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-14 Sanvesh Srivastava; Yixiang Xu
Abstract Linear mixed-effects models play a fundamental role in statistical methodology. A variety of Markov chain Monte Carlo (MCMC) algorithms exist for fitting these models, but they are inefficient in massive data settings because every iteration of any such MCMC algorithm passes through the full data. Many divide-and-conquer methods have been proposed to solve this problem, but they lack theoretical
-
Detecting Anomalous Time Series by GAMLSS-Akaike-Weights-Scoring J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-04 Cole Sodja
Abstract An extensible statistical framework for detecting anomalous time series including those with heavy-tailed distributions and nonstationarity in higher-order moments is introduced based on penalized likelihood distributional regression. Specifically, generalized additive models for location, scale, and shape are used to infer sample path representations defined by a parametric distribution with
-
Adaptive Bayesian Spectral Analysis of High-dimensional Nonstationary Time Series J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-04 Zeda Li; Ori Rosen; Fabio Ferrarelli; Robert T. Krafty
Abstract This article introduces a nonparametric approach to spectral analysis of a high-dimensional multivariate nonstationary time series. The procedure is based on a novel frequency-domain factor model that provides a flexible yet parsimonious representation of spectral matrices from a large number of simultaneously observed time series. Real and imaginary parts of the factor loading matrices are
-
Alternating Pruned Dynamic Programming for Multiple Epidemic Change-Point Estimation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-04 Zifeng Zhao; Chun Yip Yau
Abstract In this paper, we study the problem of multiple change-point detection for a univariate sequence under the epidemic setting, where the behavior of the sequence alternates between a common normal state and different epidemic states. This is a non-trivial generalization of the classical (single) epidemic change-point testing problem. To explicitly incorporate the alternating structure of the
-
Quasi-random sampling for multivariate distributions via generative neural networks J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-04 Marius Hofert; Avinash Prasad; Mu Zhu
Abstract Generative moment matching networks (GMMNs) are introduced for generating approximate quasi-random samples from multivariate models with any underlying copula in order to compute estimates with variance reduction. So far, quasi-random sampling for multivariate distributions required a careful design, exploiting specific properties (such as conditional distributions) of the implied parametric
-
Trimmed Constrained Mixed Effects Models: Formulations and Algorithms J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2021-01-04 Peng Zheng; Ryan Barber; Reed Sorensen; Christopher Murray; Aleksandr Aravkin
Abstract Mixed effects (ME) models inform a vast array of problems in the physical and social sciences, and are pervasive in meta-analysis. We consider ME models where the random effects component is linear. We then develop an efficient approach for a broad problem class that allows nonlinear measurements, priors, and constraints, and finds robust estimates in all of these cases using trimming in the
-
Fast Markov chain Monte Carlo for high dimensional Bayesian regression models with shrinkage priors J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-12-22 Rui Jin; Aixin Tan
Abstract In the past decade, many Bayesian shrinkage models have been developed for linear regression problems where the number of covariates, p, is large. Computation of the intractable posterior is often done with three-block Gibbs samplers (3BG ), based on representing the shrinkage priors as scale mixtures of Normal distributions. An alternative computing tool is a state of the art Hamiltonian
-
Monte Carlo simulation on the Stiefel manifold via polar expansion J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-12-07 Michael Jauch; Peter D. Hoff; David B. Dunson
Abstract Motivated by applications to Bayesian inference for statistical models with orthogonal matrix parameters, we present polar expansion, a general approach to Monte Carlo simulation from probability distributions on the Stiefel manifold. To bypass many of the well-established challenges of simulating from the distribution of a random orthogonal matrix Q , we construct a distribution for an unconstrained
-
Tensor Canonical Correlation Analysis with Convergence and Statistical Guarantees J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-12-04 You-Lin Chen; Mladen Kolar; Ruey S. Tsay
Abstract In many applications, such as classification of images or videos, it is of interest to develop a framework for tensor data instead of an ad-hoc way of transforming data to vectors due to the computational and under-sampling issues. In this paper, we study convergence and statistical properties of two-dimensional canonical correlation analysis (Lee and Choi, 2007) under an assumption that data
-
Forward Stepwise Deep Autoencoder-based Monotone Nonlinear Dimensionality Reduction Methods J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-12-04 Youyi Fong; Jun Xu
Abstract Dimensionality reduction is an unsupervised learning task aimed at creating a low-dimensional summary and/or extracting the most salient features of a dataset. Principal components analysis (PCA) is a linear dimensionality reduction method in the sense that each principal component is a linear combination of the input variables. To allow features that are nonlinear functions of the input variables
-
An explicit mean-covariance parameterization for multivariate response linear regression J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-11-23 Aaron J. Molstad; Guangwei Weng; Charles R. Doss; Adam J. Rothman
Abstract We develop a new method to fit the multivariate response linear regression model that exploits a parametric link between the regression coefficient matrix and the error covariance matrix. Specifically, we assume that the correlations between entries in the multivariate error random vector are proportional to the cosines of the angles between their corresponding regression coefficient matrix
-
Additive Functional Cox Model J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-11-23 Erjia Cui; Ciprian M. Crainiceanu; Andrew Leroux
Abstract We propose the Additive Functional Cox Model to flexibly quantify the association between functional covariates and time to event data. The model extends the linear functional proportional hazards model by allowing the association between the functional covariate and log hazard to vary non-linearly in both the functional domain and the value of the functional covariate. Additionally, we introduce
-
Change point detection for graphical models in the presence of missing values J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-11-23 Malte Londschien; Solt Kovács; Peter Bühlmann
Abstract We propose estimation methods for change points in high-dimensional covariance structures with an emphasis on challenging scenarios with missing values. We advocate three imputation like methods and investigate their implications on common losses used for change point detection. We also discuss how model selection methods have to be adapted to the setting of incomplete data. The methods are
-
Kriging Riemannian Data via Random Domain Decompositions J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-11-20 Alessandra Menafoglio; Davide Pigoli; Piercesare Secchi
Abstract Data taking value on a Riemannian manifold and observed over a complex spatial domain are becoming more frequent in applications, e.g. in environmental sciences and in geoscience. The analysis of these data needs to rely on local models to account for the non stationarity of the generating random process, the nonlinearity of the manifold and the complex topology of the domain. In this paper
-
MIP-BOOST: Efficient and Effective L 0 Feature Selection for Linear Regression J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-11-17 Ana Kenney; Francesca Chiaromonte; Giovanni Felici
Abstract Recent advances in mathematical programming have made Mixed Integer Optimization a competitive alternative to popular regularization methods for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. Here we propose MIP-BOOST, a revision of standard Mixed Integer Programming feature selection
-
LowCon: A design-based subsampling approach in a misspecified linear model J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-11-02 Cheng Meng; Rui Xie; Abhyuday Mandal; Xinlian Zhang; Wenxuan Zhong; Ping Ma
Abstract We consider a measurement constrained supervised learning problem, that is, (1) full sample of the predictors are given; (2) the response observations are unavailable and expensive to measure. Thus, it is ideal to select a subsample of predictor observations, measure the corresponding responses, and then fit the supervised learning model on the subsample of the predictors and responses. However
-
Nonparametric Anomaly Detection on Time Series of Graphs J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-11-02 Dorcas Ofori-Boateng; Yulia R. Gel; Ivor Cribben
Abstract Identifying change points and/or anomalies in dynamic network structures has become increasingly popular across various domains, from neuroscience to telecommunication to finance. One of the particular objectives of the anomaly detection task from the neuroscience perspective is the reconstruction of the dynamic manner of brain region interactions. However, most statistical methods for detecting
-
Modeling non-stationary extreme dependence with stationary max-stable processes and multidimensional scaling J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-11-02 Clément Chevalier; Olivia Martius; David Ginsbourger
Abstract Modeling the joint distribution of extreme events at multiple locations is a challenging task with important applications. In this study, we use max-stable models to study extreme daily precipitation events in Switzerland. The non-stationarity of the spatial process at hand involves important challenges, which are often dealt with by using a stationary model in a so-called climate space, with
-
Scalable Algorithms for Large Competing Risks Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-10-29 Eric S. Kawaguchi; Jenny I. Shen; Marc A. Suchard; Gang Li
Abstract This paper develops two orthogonal contributions to scalable sparse regression for competing risks time-to-event data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate ℓ 0-based iteratively reweighted ℓ 2-penalization algorithm that achieves sparsity in its limit, in the context of the Fine-Gray (1999) proportional subdistributional hazards (PSH) model. In
-
Particle MCMC with Poisson Resampling: Parallelization and Continuous Time Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-10-26 Tomasz Cakala; Blazej Miasojedow; Wojciech Niemiro
Abstract We introduce a new version of particle filter in which the number of “children” of a particle at a given time has a Poisson distribution. As a result, the number of particles is random and varies with time. An advantage of this scheme is that descendants of different particles can evolve independently. It makes easy to parallelize computations. Moreover, particle filter with Poisson resampling
-
Bayesian Variable Selection for Gaussian copula regression models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-10-26 A. Alexopoulos; L. Bottolo
Abstract We develop a novel Bayesian method to select important predictors in regression models with multiple responses of diverse types. A sparse Gaussian copula regression model is used to account for the multivariate dependencies between any combination of discrete and/or continuous responses and their association with a set of predictors. We utilize the parameter expansion for data augmentation
-
Penalized Quantile Regression for Distributed Big Data Using the Slack Variable Representation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-10-26 Ye Fan; Nan Lin; Xianjun Yin
Abstract Penalized quantile regression is a widely used tool for analyzing high-dimensional data with heterogeneity. Although its estimation theory has been well studied in the literature, its computation still remains a challenge in big data, due to the nonsmoothness of the check loss function and the possible nonconvexity of the penalty term. In this paper, we propose the QPADM-slack method, a parallel
-
Likelihood Evaluation of Jump-Diffusion Models Using Deterministic Nonlinear Filters* J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-10-26 Jean-François Bégin; Mathieu Boudreault
Abstract In this study, we develop a deterministic nonlinear filtering algorithm based on a high-dimensional version of Kitagawa (1987) to evaluate the likelihood function of models that allow for stochastic volatility and jumps whose arrival intensity is also stochastic. We show numerically that the deterministic filtering method is precise and much faster than the particle filter, in addition to
-
Local Linear Forests J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-10-15 Rina Friedberg; Julie Tibshirani; Susan Athey; Stefan Wager
Abstract Random forests are a powerful method for non-parametric regression, but are limited in their ability to fit smooth signals. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure, local linear forests, enables us to improve on asymptotic rates of convergence
-
Markov Chain Importance Sampling – a highly efficient estimator for MCMC J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-28 Ingmar Schuster; Ilja Klebanov
Abstract Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm
-
Asymptotically exact data augmentation: models, properties and algorithms J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-28 Maxime Vono; Nicolas Dobigeon; Pierre Chainais
Abstract Data augmentation, by the introduction of auxiliary variables, has become an ubiquitous technique to improve convergence properties, simplify the implementation or reduce the computational time of inference methods such as Markov chain Monte Carlo ones. Nonetheless, introducing appropriate auxiliary variables while preserving the initial target probability distribution and offering a computationally
-
Non-reversible jump algorithms for Bayesian nested model selection J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-28 Philippe Gagnon; Arnaud Doucet
Abstract Non-reversible Markov chain Monte Carlo methods often outperform their reversible counterparts in terms of asymptotic variance of ergodic averages and mixing properties. Lifting the state-space (Chen et al., 1999; Diaconis et al., 2000) is a generic technique for constructing such samplers. The idea is to think of the random variables we want to generate as position variables and to associate
-
d-blink: Distributed End-to-End Bayesian Entity Resolution J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-23 Neil G. Marchant; Andee Kaplan; Daniel N. Elazar; Benjamin I. P. Rubinstein; Rebecca C. Steorts
Entity resolution (ER; also known as record linkage or de-duplication) is the process of merging noisy databases, often in the absence of unique identifiers. A major advancement in ER methodology has been the application of Bayesian generative models, which provide a natural framework for inferring latent entities with rigorous quantification of uncertainty. Despite these advantages, existing models
-
A Slice Tour for Finding Hollowness in High-Dimensional Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-07-16 Ursula Laa; Dianne Cook; German Valencia
Taking projections of high-dimensional data is a common analytical and visualization technique in statistics for working with high-dimensional problems. Sectioning, or slicing, through high dimensions is less common, but can be useful for visualizing data with concavities, or nonlinear structure. It is associated with conditional distributions in statistics, and also linked brushing between plots in
-
Assessing and Visualizing Simultaneous Simulation Error J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-18 Nathan Robertson; James M. Flegal; Dootika Vats; Galin L. Jones
Monte Carlo experiments produce samples in order to estimate features such as means and quantiles of a given distribution. However, simultaneous estimation of means and quantiles has received little attention. In this setting we establish a multivariate central limit theorem for any finite combination of sample means and quantiles under the assumption of a strongly mixing process, which includes the
-
Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-16 Congyuan Yang; Carey E. Priebe; Youngser Park; David J. Marchette
Our problem of interest is to cluster vertices of a graph by identifying underlying community structure. Among various vertex clustering approaches, spectral clustering is one of the most popular methods because it is easy to implement while often outperforming more traditional clustering algorithms. However, there are two inherent model selection problems in spectral clustering, namely estimating
-
Boosting Random Forests to Reduce Bias; One-Step Boosted Forest and its Variance Estimate J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-11 Indrayudh Ghosal; Giles Hooker
In this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a one-step boosted forest. We show with simulated and real data that the one-step boosted forest has a reduced
-
Global Consensus Monte Carlo J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-08 Lewis J. Rendell; Adam M. Johansen; Anthony Lee; Nick Whiteley
To conduct Bayesian inference with large data sets, it is often convenient or necessary to distribute the data across multiple machines. We consider a likelihood function expressed as a product of terms, each associated with a subset of the data. Inspired by global variable consensus optimisation, we introduce an instrumental hierarchical model associating auxiliary statistical parameters with each
-
Model-based edge clustering J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-04 Daniel K. Sewell
Relational data can be studied using network analytic techniques which define the network as a set of actors and a set of edges connecting these actors. One important facet of network analysis that receives significant attention is community detection. However, while most community detection algorithms focus on clustering the actors of the network, it is very intuitive to cluster the edges. Connections
-
An Exact Auxiliary Variable Gibbs Sampler for a Class of Diffusions J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-01 Qi Wang; Vinayak Rao; Yee Whye Teh
Stochastic differential equations (SDEs) or diffusions are continuous-valued continuous-time stochastic processes widely used in the applied and mathematical sciences. Simulating paths from these processes is usually an intractable problem, and typically involves time-discretization approximations. We propose an exact Markov chain Monte Carlo sampling algorithm that involves no such time-discretization
-
Improving Bayesian Local Spatial Models in Large Data Sets J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-01 Amanda Lenzi; Stefano Castruccio; Håvard Rue; Marc G. Genton
Environmental processes resolved at a sufficiently small scale in space and time inevitably display non-stationary behavior. Such processes are both challenging to model and computationally expensive when the data size is large. Instead of modeling the global non-stationarity explicitly, local models can be applied to disjoint regions of the domain. The choice of the size of these regions is dictated
-
Shrinking the Covariance Matrix using Convex Penalties on the Matrix-Log Transformation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-01 Mengxi Yi; David E. Tyler
For q-dimensional data, penalized versions of the sample covariance matrix are important when the sample size is small or modest relative to q. Since the negative log-likelihood under multivariate normal sampling is convex in Σ−1, the inverse of the covariance matrix, it is common to consider additive penalties which are also convex in Σ−1. More recently, Deng and Tsui (2013) and Yu et al. (2017) have
-
Quantum Annealing via Path-Integral Monte Carlo with Data Augmentation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-01 Jianchang Hu; Yazhen Wang
This paper considers quantum annealing in the Ising framework for solving combinatorial optimization problems. The path-integral Monte Carlo simulation approach is often used to approximate quantum annealing and implement the approximation by classical computers, which refers to simulated quantum annealing. In this paper we introduce a data augmentation scheme into simulated quantum annealing and develop
-
Nonlinear Variable Selection via Deep Neural Networks J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-09-01 Yao Chen; Qingyi Gao; Faming Liang; Xiao Wang
This paper presents a general framework for high-dimensional nonlinear variable selection using deep neural networks under the framework of supervised learning. The network architecture includes both a selection layer and approximation layers. The problem can be cast as a sparsity-constrained optimization with a sparse parameter in the selection layer and other parameters in the approximation layers
-
Reduced-dimensional Monte Carlo Maximum Likelihood for Latent Gaussian Random Field Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-08-24 Jaewoo Park; Murali Haran
Monte Carlo maximum likelihood (MCML) provides an elegant approach to find maximum likelihood estimators (MLEs) for latent variable models. However, MCML algorithms are computationally expensive when the latent variables are high-dimensional and correlated, as is the case for latent Gaussian random field models. Latent Gaussian random field models are widely used, for example in building flexible regression
-
Nonstationary modeling with sparsity for spatial data via the basis graphical lasso J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-08-19 Mitchell Krock; William Kleiber; Stephen Becker
Many modern spatial models express the stochastic variation component as a basis expansion with random coefficients. Low rank models, approximate spectral decompositions, multiresolution representations, stochastic partial differential equations, and empirical orthogonal functions all fall within this basic framework. Given a particular basis, stochastic dependence relies on flexible modeling of the
-
Dimension reduction for outlier detection using DOBIN J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-08-18 Sevvandi Kandanaarachchi; Rob J. Hyndman
This paper introduces DOBIN, a new approach to select a set of basis vectors tailored for outlier detection. DOBIN has a simple mathematical foundation and can be used as a dimension reduction tool for outlier detection tasks. We demonstrate the effectiveness of DOBIN on an extensive data repository, by comparing the performance of outlier detection methods using DOBIN and other bases. We further illustrate
-
Functional regression for densely observed data with novel regularization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 2020-08-14 Ruiyan Luo; Xin Qi
Smoothness penalty is an efficient regularization method in functional data analysis. However, for a spiky coefficient function which may arise when densely observed spiky functional data are involved, the traditional smoothness penalty could be too strong and lead to an over-smoothed estimate. In this paper, we propose a new family of smoothness penalties which are expressed using wavelet coefficients