显示样式： 排序： IF:  GO 导出

Assessing and Visualizing Simultaneous Simulation Error J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200918
Nathan Robertson; James M. Flegal; Dootika Vats; Galin L. JonesMonte Carlo experiments produce samples in order to estimate features such as means and quantiles of a given distribution. However, simultaneous estimation of means and quantiles has received little attention. In this setting we establish a multivariate central limit theorem for any finite combination of sample means and quantiles under the assumption of a strongly mixing process, which includes the

Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200916
Congyuan Yang; Carey E. Priebe; Youngser Park; David J. MarchetteOur problem of interest is to cluster vertices of a graph by identifying underlying community structure. Among various vertex clustering approaches, spectral clustering is one of the most popular methods because it is easy to implement while often outperforming more traditional clustering algorithms. However, there are two inherent model selection problems in spectral clustering, namely estimating

Boosting Random Forests to Reduce Bias; OneStep Boosted Forest and its Variance Estimate J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200911
Indrayudh Ghosal; Giles HookerIn this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a onestep boosted forest. We show with simulated and real data that the onestep boosted forest has a reduced

Global Consensus Monte Carlo J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200908
Lewis J. Rendell; Adam M. Johansen; Anthony Lee; Nick WhiteleyTo conduct Bayesian inference with large data sets, it is often convenient or necessary to distribute the data across multiple machines. We consider a likelihood function expressed as a product of terms, each associated with a subset of the data. Inspired by global variable consensus optimisation, we introduce an instrumental hierarchical model associating auxiliary statistical parameters with each

Modelbased edge clustering J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200904
Daniel K. SewellRelational data can be studied using network analytic techniques which define the network as a set of actors and a set of edges connecting these actors. One important facet of network analysis that receives significant attention is community detection. However, while most community detection algorithms focus on clustering the actors of the network, it is very intuitive to cluster the edges. Connections

An Exact Auxiliary Variable Gibbs Sampler for a Class of Diffusions J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Qi Wang; Vinayak Rao; Yee Whye TehStochastic differential equations (SDEs) or diffusions are continuousvalued continuoustime stochastic processes widely used in the applied and mathematical sciences. Simulating paths from these processes is usually an intractable problem, and typically involves timediscretization approximations. We propose an exact Markov chain Monte Carlo sampling algorithm that involves no such timediscretization

Improving Bayesian Local Spatial Models in Large Data Sets J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Amanda Lenzi; Stefano Castruccio; Håvard Rue; Marc G. GentonEnvironmental processes resolved at a sufficiently small scale in space and time inevitably display nonstationary behavior. Such processes are both challenging to model and computationally expensive when the data size is large. Instead of modeling the global nonstationarity explicitly, local models can be applied to disjoint regions of the domain. The choice of the size of these regions is dictated

Shrinking the Covariance Matrix using Convex Penalties on the MatrixLog Transformation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Mengxi Yi; David E. TylerFor qdimensional data, penalized versions of the sample covariance matrix are important when the sample size is small or modest relative to q. Since the negative loglikelihood under multivariate normal sampling is convex in Σ−1, the inverse of the covariance matrix, it is common to consider additive penalties which are also convex in Σ−1. More recently, Deng and Tsui (2013) and Yu et al. (2017) have

Quantum Annealing via PathIntegral Monte Carlo with Data Augmentation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Jianchang Hu; Yazhen WangThis paper considers quantum annealing in the Ising framework for solving combinatorial optimization problems. The pathintegral Monte Carlo simulation approach is often used to approximate quantum annealing and implement the approximation by classical computers, which refers to simulated quantum annealing. In this paper we introduce a data augmentation scheme into simulated quantum annealing and develop

Nonlinear Variable Selection via Deep Neural Networks J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Yao Chen; Qingyi Gao; Faming Liang; Xiao WangThis paper presents a general framework for highdimensional nonlinear variable selection using deep neural networks under the framework of supervised learning. The network architecture includes both a selection layer and approximation layers. The problem can be cast as a sparsityconstrained optimization with a sparse parameter in the selection layer and other parameters in the approximation layers

Reduceddimensional Monte Carlo Maximum Likelihood for Latent Gaussian Random Field Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200824
Jaewoo Park; Murali HaranMonte Carlo maximum likelihood (MCML) provides an elegant approach to find maximum likelihood estimators (MLEs) for latent variable models. However, MCML algorithms are computationally expensive when the latent variables are highdimensional and correlated, as is the case for latent Gaussian random field models. Latent Gaussian random field models are widely used, for example in building flexible regression

Nonstationary modeling with sparsity for spatial data via the basis graphical lasso J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200819
Mitchell Krock; William Kleiber; Stephen BeckerMany modern spatial models express the stochastic variation component as a basis expansion with random coefficients. Low rank models, approximate spectral decompositions, multiresolution representations, stochastic partial differential equations, and empirical orthogonal functions all fall within this basic framework. Given a particular basis, stochastic dependence relies on flexible modeling of the

Dimension reduction for outlier detection using DOBIN J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200818
Sevvandi Kandanaarachchi; Rob J. HyndmanThis paper introduces DOBIN, a new approach to select a set of basis vectors tailored for outlier detection. DOBIN has a simple mathematical foundation and can be used as a dimension reduction tool for outlier detection tasks. We demonstrate the effectiveness of DOBIN on an extensive data repository, by comparing the performance of outlier detection methods using DOBIN and other bases. We further illustrate

Functional regression for densely observed data with novel regularization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
Ruiyan Luo; Xin QiSmoothness penalty is an efficient regularization method in functional data analysis. However, for a spiky coefficient function which may arise when densely observed spiky functional data are involved, the traditional smoothness penalty could be too strong and lead to an oversmoothed estimate. In this paper, we propose a new family of smoothness penalties which are expressed using wavelet coefficients

Fast Search and Estimation of Bayesian Nonparametric Mixture Models Using a Classification Annealing EM Algorithm J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
George KarabatsosBayesian nonparametric (BNP) infinitemixture models provide flexible and accurate density estimation, cluster analysis, and regression. However, for the posterior inference of such a model, MCMC algorithms are complex, often need to be tailormade for different BNP priors, and are intractable for large data sets. We introduce a BNP classification annealing EM algorithm which employs importance sampling

Trace Ratio Optimization for HighDimensional MultiClass Discrimination J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
Jeongyoun Ahn; Hee Cheol Chung; Yongho JeonIn multiclass discrimination with highdimensional data, identifying a lowerdimensional subspace with maximum class separation is crucial. We propose a new optimization criterion for finding such a discriminant subspace, which is the ratio of two traces: the trace of betweenclass scatter matrix and the trace of withinclass scatter matrix. Since this problem is not welldefined for highdimensional

Spectrally Sparse Nonparametric Regression via Elastic Net Regularized Smoothers J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
Nathaniel E. HelwigNonparametric regression frameworks, such as generalized additive models (GAMs) and smoothing spline analysis of variance (SSANOVA) models, extend the generalized linear model (GLM) by allowing for unknown functional relationships between an exponential family response variable and a collection of predictor variables. The unknown functional relationships are typically estimated using penalized likelihood

ModelFree Variable Selection with MatrixValued Predictors J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
Zeda Li; Yuexiao DongWe introduce a novel framework for modelfree variable selection with matrixvalued predictors. To test the importance of rows, columns, and submatrices of the predictor matrix in terms of predicting the response, three types of hypotheses are formulated under a unified framework. The asymptotic properties of the test statistics under the null hypothesis are established and a permutation testing algorithm

Anomaly Detection in High Dimensional Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200813
Priyanga Dilini Talagala; Rob J. Hyndman; Kate SmithMilesThe HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in highdimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation where its knearest

Marginallycalibrated deep distributional regression J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200813
Nadja Klein; David J. Nott; Michael Stanley SmithDeep neural network (DNN) regression models are widely used in applications requiring stateoftheart predictive accuracy. However, until recently there has been little work on accurate uncertainty quantification for predictions from such models. We add to this literature by outlining an approach to constructing predictive distributions that are ‘marginally calibrated’. This is where the long run

An efficient algorithm for minimizing multi nonsmooth component functions J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200805
Minh Pham; Anh Ninh; Hoang Le; Yufeng LiuMany problems in statistics and machine learning can be formulated as an optimization problem of a finite sum of nonsmooth convex functions. We propose an algorithm to minimize this type of objective functions based on the idea of alternating linearization. Our algorithm retains the simplicity of contemporary methods without any restrictive assumptions on the smoothness of the loss function. We apply

Model interpretation through lowerdimensional posterior summarization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200721
Spencer Woody; Carlos M. Carvalho; Jared S. MurrayNonparametric regression models have recently surged in their power and popularity, accompanying the trend of increasing dataset size and complexity. While these models have proven their predictive ability in empirical settings, they are often difficult to interpret and do not address the underlying inferential goals of the analyst or decision maker. In this paper, we propose a modular twostage approach

Ustatistical inference for hierarchical clustering J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200720
Marcio Valk; Gabriela Bettella CybisClustering methods are valuable tools for the identification of patterns in high dimensional data with applications in many scientific fields. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with High Dimension Low Sample Size (HDLSS) data. We develop a Ustatistics based clustering approach that assesses statistical significance in clustering and

mcvis: A new framework for collinearity discovery, diagnostic and visualization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200630
Chen Lin; Kevin Wang; Samuel MuellerCollinearity discovery through diagnostic tools is an important analysis step when performing linear regression. Despite their widespread use, collinearity indices such as the variance inflation factor and the condition number have limitations and may not be effective in some applications. In this article we will contribute to the study of conventional collinearity indices through theoretical and

Sparse Single Index Models for Multivariate Responses J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200630
Yuan Feng; Luo Xiao; Eric C. ChiJoint models are popular for analyzing data with multivariate responses. We propose a sparse multivariate single index model, where responses and predictors are linked by unspecified smooth functions and multiple matrix level penalties are employed to select predictors and induce lowrank structures across responses. An alternating direction method of multipliers (ADMM) based algorithm is proposed

Optimal Sampling for Generalized Linear Models under Measurement Constraints J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200608
Tao Zhang; Yang Ning; David RuppertUnder “measurement constraints,” responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. Our goal is to sample a relatively small portion of the dataset where the expensive responses will be measured and the resultant sampling estimator is statistically efficient. Measurement constraints require the sampling

Bayesian spatial clustering of extremal behaviour for hydrological variables J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200604
Christian Rohrbeck; Jonathan A. TawnTo address the need for efficient inference for a range of hydrological extreme value problems, spatial pooling of information is the standard approach for marginal tail estimation. We propose the first extreme value spatial clustering methods which account for both the similarity of the marginal tails and the spatial dependence structure of the data to determine the appropriate level of pooling. Spatial

A slice tour for finding hollowness in highdimensional data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200604
Ursula Laa; Dianne Cook; German ValenciaTaking projections of highdimensional data is a common analytical and visualisation technique in statistics for working with highdimensional problems. Sectioning, or slicing, through high dimensions is less common, but can be useful for visualising data with concavities, or nonlinear structure. It is associated with conditional distributions in statistics, and also linked brushing between plots

Illumination depth J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200604
Stanislav Nagy; Jiří DvořákThe concept of illumination bodies studied in convex geometry is used to amend the halfspace depth for multivariate data. The proposed notion of illumination enables finer resolution of the sample points, naturally breaks ties in the associated depthbased ordering, and introduces a depthlike function for points outside the convex hull of the support of the probability measure. The illumination is

Surrogate Residuals for Discrete Choice Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200602
Chao Cheng; Rui Wang; Heping ZhangDiscrete Choice Models (DCMs) are a class of models for modelling response variables that take values from a set of alternatives. Examples include logistic regression, probit regression, and multinomial logistic regression. These models are also referred together as generalized linear models. Although there exist methods for the goodness of fit of DCMs, defining intuitive residuals for such models

Delayed acceptance ABCSMC J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200602
Richard G. Everitt; Paulina A. RowińskaApproximate Bayesian computation (ABC) is now an established technique for statistical inference used in cases where the likelihood function is computationally expensive or not available. It relies on the use of a model that is specified in the form of a simulator, and approximates the likelihood at a parameter value θ by simulating auxiliary data sets x and evaluating the distance of x from the true

Correction J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200528
(2020). Correction. Journal of Computational and Graphical Statistics. Ahead of Print.

Efficient Parameter Sampling for Markov Jump Processes J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200507
Boqian Zhang; Vinayak RaoMarkov jump processes are continuoustime stochastic processes widely used in a variety of applied disciplines. Inference typically proceeds via Markov chain Monte Carlo, the stateoftheart being a uniformizationbased auxiliary variable Gibbs sampler. This was designed for situations where the process parameters are known, and Bayesian inference over unknown parameters is typically carried out by

Automated Redistricting Simulation Using Markov Chain Monte Carlo J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200507
Benjamin Fifield; , Michael Higgins; Kosuke Imai; Alexander TarrLegislative redistricting is a critical element of representative democracy. A number of political scientists have used simulation methods to sample redistricting plans under various constraints to assess their impact on partisanship and other aspects of representation. However, while many optimization algorithms have been proposed, surprisingly few simulation methods exist in the published scholarship

Predicting the Output From a Stochastic Computer Model When a Deterministic Approximation is Available J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200507
Evan Baker; Peter Challenor; Matt EamesStatistically modeling the output of a stochastic computer model can be difficult to do accurately without a large simulation budget. We alleviate this problem by exploiting readily available deterministic approximations to efficiently learn about the respective stochastic computer models. This is done via the summation of two Gaussian processes; one responsible for modeling the deterministic approximation

Identifying Heterogeneous Effect using Latent Supervised Clustering with Adaptive Fusion J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200506
Jingxiang Chen; Quoc TranDinh; Michael R. Kosorok; Yufeng LiuPrecision medicine is an important area of research with the goal of identifying the optimal treatment for each individual patient. In the literature, various methods are proposed to divide the population into subgroups according to the heterogeneous effects of individuals. In this paper, a new exploratory machine learning tool, named latent supervised clustering, is proposed to identify the heterogeneous

Massive parallelization boosts big Bayesian multidimensional scaling J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200505
Andrew J. Holbrook; Philippe Lemey; Guy Baele; Simon Dellicour; Dirk Brockmann; Andrew Rambaut; Marc A. SuchardBig Bayes is the computationally intensive coapplication of big data and large, expressive Bayesian models for the analysis of complex phenomena in scientific inference and statistical learning. Standing as an example, Bayesian multidimensional scaling (MDS) can help scientists learn viral trajectories through spacetime, but its computational burden prevents its wider use. Crucial MDS model calculations

Automatic Transformation and Integration to Improve Visualization and Discovery of Latent Effects in Imaging Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200427
Gregory J. Hunt; Mark A. Dane; James E. Korkola; Laura M. Heiser; Johann A. GagnonBartschProper data transformation is an essential part of analysis. Choosing appropriate transformations for variables can enhance visualization, improve efficacy of analytical methods, and increase data interpretability. However, determining appropriate transformations of variables from highcontent imaging data poses new challenges. Imaging data produce hundreds of covariates from each of thousands of images

Rerandomization strategies for balancing covariates using preexperimental longitudinal data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200421
Per Johansson; Mårten SchultzbergABSTRACT This paper considers experimental design based on the strategy of rerandomization to increase the efficiency in experiments. Two aspects of rerandomization are addressed. First, we propose a twostage allocation sample scheme for randomization inference to the units in experiments that guarantees that the differenceinmean estimator is an unbiased estimator of the sample average treatment

A Pliable Lasso J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190905
Robert Tibshirani; Jerome FriedmanWe propose a generalization of the lasso that allows the model coefficients to vary as a function of a general set of some prespecified modifying variables. These modifiers might be variables such as gender, age, or time. The paradigm is quite general, with each lasso coefficient modified by a sparse linear function of the modifying variables Z. The model is estimated in a hierarchical fashion to control

Bivariate Residual Plots With Simulation Polygons J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190906
Rafael A. Moral; John Hinde; Clarice G. B. DemétrioWhen using univariate models, goodness of fit can be assessed through many different methods, including graphical tools such as halfnormal plots with a simulation envelope. This is straightforward due to the notion of ordering of a univariate sample, which can readily reveal possible outliers. In the bivariate case, however, it is often difficult to detect extreme points and verify whether a sample

Estimating TimeVarying Graphical Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190903
Jilei Yang; Jie PengIn this article, we study timevarying graphical models based on data measured over a temporal grid. Such models are motivated by the needs to describe and understand evolving interacting relationships among a set of random variables in many real applications, for instance, the study of how stock prices interact with each other and how such interactions change over time. We propose a new model, LOcal

Bayesian Model Averaging Over Treebased Dependence Structures for Multivariate Extremes J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190829
Sabrina Vettori; Raphaël Huser; Johan Segers; Marc G. GentonDescribing the complex dependence structure of extreme phenomena is particularly challenging. To tackle this issue, we develop a novel statistical method that describes extremal dependence taking advantage of the inherent treebased dependence structure of the maxstable nested logistic distribution, and which identifies possible clusters of extreme variables using reversible jump Markov chain Monte

Estimating the Number of Clusters Using CrossValidation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190930
Wei Fu; Patrick O. PerryMany clustering methods, including kmeans, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong modeling assumptions. This article proposes a datadriven approach to estimate the number of clusters based on a novel form of crossvalidation. The proposed method differs

Parallelization of a Common Changepoint Detection Method J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190906
S. O. Tickle; I. A. Eckley; P. Fearnhead; K. HaynesAbstract In recent years, various means of efficiently detecting changepoints have been proposed, with one popular approach involving minimizing a penalized cost function using dynamic programming. In some situations, these algorithms can have an expected computational cost that is linear in the number of data points; however, the worst case cost remains quadratic. We introduce two means of improving

Efficient Construction of Test Inversion Confidence Intervals Using Quantile Regression J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190903
Eyal Fisher; Regev Schweiger; Saharon RossetModern problems in statistics often include estimators of high computational complexity and with complicated distributions. Statistical inference on such estimators usually relies on asymptotic normality assumptions, however, such assumptions are often not applicable for available sample sizes, due to dependencies in the data. A common alternative is the use of resampling procedures, such as bootstrapping

Testing SparsityInducing Penalties J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190819
Maryclare Griffin; Peter D. HoffMany penalized maximum likelihood estimators correspond to posterior mode estimators under specific prior distributions. Appropriateness of a particular class of penalty functions can therefore be interpreted as the appropriateness of a prior for the parameters. For example, the appropriateness of a lasso penalty for regression coefficients depends on the extent to which the empirical distribution

Diagonal Discriminant Analysis With Feature Selection for HighDimensional Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190816
Sarah E. Romanes; John T. Ormerod; Jean Y. H. YangWe introduce a new method of performing highdimensional discriminant analysis (DA), which we call multiDA. Starting from multiclass diagonal DA classifiers which avoid the problem of highdimensional covariance estimation we construct a hybrid model that seamlessly integrates feature selection components. Our feature selection component naturally simplifies to weights which are simple functions of

Bayesian Deep Net GLM and GLMM J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190816
M.N. Tran; N. Nguyen; D. Nott; R. KohnDeep feedforward neural networks (DFNNs) are a powerful tool for functional approximation. We describe flexible versions of generalized linear and generalized linear mixed models incorporating basis functions formed by a DFNN. The consideration of neural networks with random effects is not widely used in the literature, perhaps because of the computational challenges of incorporating subject specific

Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190719
Michael Weylandt; John Nagorski; Genevera I. AllenConvex clustering is a promising new approach to the classical problem of clustering, combining strong performance in empirical studies with rigorous theoretical foundations. Despite these advantages, convex clustering has not been widely adopted, due to its computationally intensive nature and its lack of compelling visualizations. To address these impediments, we introduce Algorithmic Regularization

Scalable Visualization Methods for Modern Generalized Additive Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190719
Matteo Fasiolo; Raphaël Nedellec; Yannig Goude; Simon N. WoodIn the last two decades, the growth of computational resources has made it possible to handle generalized additive models (GAMs) that formerly were too costly for serious applications. However, the growth in model complexity has not been matched by improved visualizations for model development and results presentation. Motivated by an industrial application in electricity load forecasting, we identify

A Function Emulation Approach for Doubly Intractable Distributions J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190723
Jaewoo Park; Murali HaranDoubly intractable distributions arise in many settings, for example, in Markov models for point processes and exponential random graph models for networks. Bayesian inference for these models is challenging because they involve intractable normalizing “constants” that are actually functions of the parameters of interest. Although several computational methods have been developed for these models,

Scalable Bayesian Nonparametric Clustering and Classification J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190719
Yang Ni; Peter Müller; Maurice Diesendruck; Sinead Williamson; Yitan Zhu; Yuan JiWe develop a scalable multistep Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is “embarrassingly parallel” and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach make inference for a wide range of Bayesian nonparametric mixture models applicable to large

BIVAS: A Scalable Bayesian Method for BiLevel Variable Selection With Applications J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190719
Mingxuan Cai; Mingwei Dai; Jingsi Ming; Heng Peng; Jin Liu; Can YangIn this article, we consider a Bayesian bilevel variable selection problem in highdimensional regressions. In many practical situations, it is natural to assign group membership to each predictor. Examples include that genetic variants can be grouped at the gene level and a covariate from different tasks naturally forms a group. Thus, it is of interest to select important groups as well as important

Scalable Bayesian Regression in High Dimensions With Multiple Data Sources J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190715
Konstantinos Perrakis; Sach Mukherjee; The Alzheimer’s Disease Neuroimaging InitiativeApplications of highdimensional regression often involve multiple sources or types of covariates. We propose methodology for this setting, emphasizing the “wide data” regime with large total dimensionality p and sample size n≪p. We focus on a flexible ridgetype prior with shrinkage levels that are specific to each data type or source and that are set automatically by empirical Bayes. All estimation

Anomaly Detection in Streaming Nonstationary Temporal Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190624
Priyanga Dilini Talagala; Rob J. Hyndman; Kate SmithMiles; Sevvandi Kandanaarachchi; Mario A. MuñozThis article proposes a framework that provides early detection of anomalous series within a large collection of nonstationary streaming timeseries data. We define an anomaly as an observation, that is, very unlikely given the recent distribution of a given system. The proposed framework first calculates a boundary for the system’s typical behavior using extreme value theory. Then a sliding window

A Semiparametric Bayesian Approach to Dropout in Longitudinal Studies With Auxiliary Covariates J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190702
Tianjian Zhou; Michael J. Daniels; Peter MüllerWe develop a semiparametric Bayesian approach to missing outcome data in longitudinal studies in the presence of auxiliary covariates. We consider a joint model for the full data response, missingness, and auxiliary covariates. We include auxiliary covariates to “move” the missingness “closer” to missing at random. In particular, we specify a semiparametric Bayesian model for the observed data via

Generalized Spatially Varying Coefficient Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200420
Myungjin Kim; Li WangIn this paper, we introduce a new class of nonparametric regression models, called generalized spatially varying coefficient models (GSVCMs), for data distributed over complex domains. For model estimation, we propose a nonparametric quasilikelihood approach using the bivariate penalized spline approximation technique. We show that our estimation procedure is able to handle irregularlyshaped spatial

HighDimensional Copula Variational Approximation Through Transformation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200420
Michael Stanley Smith; Rubén LoaizaMaya; David J. NottVariational methods are attractive for computing Bayesian inference when exact inference is impractical. They approximate a target distribution—either the posterior or an augmented posterior—using a simpler distribution that is selected to balance accuracy with computational feasibility. Here, we approximate an elementwise parametric transformation of the target distribution as multivariate Gaussian

Poisson KernelBased Clustering on the Sphere: Convergence Properties, Identifiability, and a Method of Sampling J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200420
Mojgan Golzy; Marianthi MarkatouSpherical or directional data arise in many applications of interest. Furthermore, many nondirectional datasets can be usefully reexpressed in the form of directions and analyzed as spherical data. We have proposed a clustering algorithm using mixtures of Poissonkernelbased densities (PKBD) on the sphere. We prove convergence of the associated generalized EMalgorithm, investigate the identifiability