显示样式： 排序： IF:  GO 导出

An explicit meancovariance parameterization for multivariate response linear regression J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201123
Aaron J. Molstad; Guangwei Weng; Charles R. Doss; Adam J. RothmanAbstract We develop a new method to fit the multivariate response linear regression model that exploits a parametric link between the regression coefficient matrix and the error covariance matrix. Specifically, we assume that the correlations between entries in the multivariate error random vector are proportional to the cosines of the angles between their corresponding regression coefficient matrix

Additive Functional Cox Model J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201123
Erjia Cui; Ciprian M. Crainiceanu; Andrew LerouxAbstract We propose the Additive Functional Cox Model to flexibly quantify the association between functional covariates and time to event data. The model extends the linear functional proportional hazards model by allowing the association between the functional covariate and log hazard to vary nonlinearly in both the functional domain and the value of the functional covariate. Additionally, we introduce

Change point detection for graphical models in the presence of missing values J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201123
Malte Londschien; Solt Kovács; Peter BühlmannAbstract We propose estimation methods for change points in highdimensional covariance structures with an emphasis on challenging scenarios with missing values. We advocate three imputation like methods and investigate their implications on common losses used for change point detection. We also discuss how model selection methods have to be adapted to the setting of incomplete data. The methods are

Kriging Riemannian Data via Random Domain Decompositions J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201120
Alessandra Menafoglio; Davide Pigoli; Piercesare SecchiAbstract Data taking value on a Riemannian manifold and observed over a complex spatial domain are becoming more frequent in applications, e.g. in environmental sciences and in geoscience. The analysis of these data needs to rely on local models to account for the non stationarity of the generating random process, the nonlinearity of the manifold and the complex topology of the domain. In this paper

MIPBOOST: Efficient and Effective L 0 Feature Selection for Linear Regression J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201117
Ana Kenney; Francesca Chiaromonte; Giovanni FeliciAbstract Recent advances in mathematical programming have made Mixed Integer Optimization a competitive alternative to popular regularization methods for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. Here we propose MIPBOOST, a revision of standard Mixed Integer Programming feature selection

LowCon: A designbased subsampling approach in a misspecified linear model J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201102
Cheng Meng; Rui Xie; Abhyuday Mandal; Xinlian Zhang; Wenxuan Zhong; Ping MaAbstract We consider a measurement constrained supervised learning problem, that is, (1) full sample of the predictors are given; (2) the response observations are unavailable and expensive to measure. Thus, it is ideal to select a subsample of predictor observations, measure the corresponding responses, and then fit the supervised learning model on the subsample of the predictors and responses. However

Nonparametric Anomaly Detection on Time Series of Graphs J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201102
Dorcas OforiBoateng; Yulia R. Gel; Ivor CribbenAbstract Identifying change points and/or anomalies in dynamic network structures has become increasingly popular across various domains, from neuroscience to telecommunication to finance. One of the particular objectives of the anomaly detection task from the neuroscience perspective is the reconstruction of the dynamic manner of brain region interactions. However, most statistical methods for detecting

Modeling nonstationary extreme dependence with stationary maxstable processes and multidimensional scaling J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201102
Clément Chevalier; Olivia Martius; David GinsbourgerAbstract Modeling the joint distribution of extreme events at multiple locations is a challenging task with important applications. In this study, we use maxstable models to study extreme daily precipitation events in Switzerland. The nonstationarity of the spatial process at hand involves important challenges, which are often dealt with by using a stationary model in a socalled climate space, with

Scalable Algorithms for Large Competing Risks Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201029
Eric S. Kawaguchi; Jenny I. Shen; Marc A. Suchard; Gang LiAbstract This paper develops two orthogonal contributions to scalable sparse regression for competing risks timetoevent data. First, we study and accelerate the broken adaptive ridge method (BAR), a surrogate ℓ 0based iteratively reweighted ℓ 2penalization algorithm that achieves sparsity in its limit, in the context of the FineGray (1999) proportional subdistributional hazards (PSH) model. In

Particle MCMC with Poisson Resampling: Parallelization and Continuous Time Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201026
Tomasz Cakala; Blazej Miasojedow; Wojciech NiemiroAbstract We introduce a new version of particle filter in which the number of “children” of a particle at a given time has a Poisson distribution. As a result, the number of particles is random and varies with time. An advantage of this scheme is that descendants of different particles can evolve independently. It makes easy to parallelize computations. Moreover, particle filter with Poisson resampling

Bayesian Variable Selection for Gaussian copula regression models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201026
A. Alexopoulos; L. BottoloAbstract We develop a novel Bayesian method to select important predictors in regression models with multiple responses of diverse types. A sparse Gaussian copula regression model is used to account for the multivariate dependencies between any combination of discrete and/or continuous responses and their association with a set of predictors. We utilize the parameter expansion for data augmentation

Penalized Quantile Regression for Distributed Big Data Using the Slack Variable Representation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201026
Ye Fan; Nan Lin; Xianjun YinAbstract Penalized quantile regression is a widely used tool for analyzing highdimensional data with heterogeneity. Although its estimation theory has been well studied in the literature, its computation still remains a challenge in big data, due to the nonsmoothness of the check loss function and the possible nonconvexity of the penalty term. In this paper, we propose the QPADMslack method, a parallel

Likelihood Evaluation of JumpDiffusion Models Using Deterministic Nonlinear Filters* J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201026
JeanFrançois Bégin; Mathieu BoudreaultAbstract In this study, we develop a deterministic nonlinear filtering algorithm based on a highdimensional version of Kitagawa (1987) to evaluate the likelihood function of models that allow for stochastic volatility and jumps whose arrival intensity is also stochastic. We show numerically that the deterministic filtering method is precise and much faster than the particle filter, in addition to

Local Linear Forests J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20201015
Rina Friedberg; Julie Tibshirani; Susan Athey; Stefan WagerAbstract Random forests are a powerful method for nonparametric regression, but are limited in their ability to fit smooth signals. Taking the perspective of random forests as an adaptive kernel method, we pair the forest kernel with a local linear regression adjustment to better capture smoothness. The resulting procedure, local linear forests, enables us to improve on asymptotic rates of convergence

Markov Chain Importance Sampling – a highly efficient estimator for MCMC J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200928
Ingmar Schuster; Ilja KlebanovAbstract Markov chain (MC) algorithms are ubiquitous in machine learning and statistics and many other disciplines. Typically, these algorithms can be formulated as acceptance rejection methods. In this work we present a novel estimator applicable to these methods, dubbed Markov chain importance sampling (MCIS), which efficiently makes use of rejected proposals. For the unadjusted Langevin algorithm

Asymptotically exact data augmentation: models, properties and algorithms J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200928
Maxime Vono; Nicolas Dobigeon; Pierre ChainaisAbstract Data augmentation, by the introduction of auxiliary variables, has become an ubiquitous technique to improve convergence properties, simplify the implementation or reduce the computational time of inference methods such as Markov chain Monte Carlo ones. Nonetheless, introducing appropriate auxiliary variables while preserving the initial target probability distribution and offering a computationally

Nonreversible jump algorithms for Bayesian nested model selection J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200928
Philippe Gagnon; Arnaud DoucetAbstract Nonreversible Markov chain Monte Carlo methods often outperform their reversible counterparts in terms of asymptotic variance of ergodic averages and mixing properties. Lifting the statespace (Chen et al., 1999; Diaconis et al., 2000) is a generic technique for constructing such samplers. The idea is to think of the random variables we want to generate as position variables and to associate

dblink: Distributed EndtoEnd Bayesian Entity Resolution J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200923
Neil G. Marchant; Andee Kaplan; Daniel N. Elazar; Benjamin I. P. Rubinstein; Rebecca C. SteortsEntity resolution (ER; also known as record linkage or deduplication) is the process of merging noisy databases, often in the absence of unique identifiers. A major advancement in ER methodology has been the application of Bayesian generative models, which provide a natural framework for inferring latent entities with rigorous quantification of uncertainty. Despite these advantages, existing models

A Slice Tour for Finding Hollowness in HighDimensional Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200716
Ursula Laa; Dianne Cook; German ValenciaTaking projections of highdimensional data is a common analytical and visualization technique in statistics for working with highdimensional problems. Sectioning, or slicing, through high dimensions is less common, but can be useful for visualizing data with concavities, or nonlinear structure. It is associated with conditional distributions in statistics, and also linked brushing between plots in

Assessing and Visualizing Simultaneous Simulation Error J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200918
Nathan Robertson; James M. Flegal; Dootika Vats; Galin L. JonesMonte Carlo experiments produce samples in order to estimate features such as means and quantiles of a given distribution. However, simultaneous estimation of means and quantiles has received little attention. In this setting we establish a multivariate central limit theorem for any finite combination of sample means and quantiles under the assumption of a strongly mixing process, which includes the

Simultaneous Dimensionality and Complexity Model Selection for Spectral Graph Clustering J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200916
Congyuan Yang; Carey E. Priebe; Youngser Park; David J. MarchetteOur problem of interest is to cluster vertices of a graph by identifying underlying community structure. Among various vertex clustering approaches, spectral clustering is one of the most popular methods because it is easy to implement while often outperforming more traditional clustering algorithms. However, there are two inherent model selection problems in spectral clustering, namely estimating

Boosting Random Forests to Reduce Bias; OneStep Boosted Forest and its Variance Estimate J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200911
Indrayudh Ghosal; Giles HookerIn this paper we propose using the principle of boosting to reduce the bias of a random forest prediction in the regression setting. From the original random forest fit we extract the residuals and then fit another random forest to these residuals. We call the sum of these two random forests a onestep boosted forest. We show with simulated and real data that the onestep boosted forest has a reduced

Global Consensus Monte Carlo J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200908
Lewis J. Rendell; Adam M. Johansen; Anthony Lee; Nick WhiteleyTo conduct Bayesian inference with large data sets, it is often convenient or necessary to distribute the data across multiple machines. We consider a likelihood function expressed as a product of terms, each associated with a subset of the data. Inspired by global variable consensus optimisation, we introduce an instrumental hierarchical model associating auxiliary statistical parameters with each

Modelbased edge clustering J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200904
Daniel K. SewellRelational data can be studied using network analytic techniques which define the network as a set of actors and a set of edges connecting these actors. One important facet of network analysis that receives significant attention is community detection. However, while most community detection algorithms focus on clustering the actors of the network, it is very intuitive to cluster the edges. Connections

An Exact Auxiliary Variable Gibbs Sampler for a Class of Diffusions J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Qi Wang; Vinayak Rao; Yee Whye TehStochastic differential equations (SDEs) or diffusions are continuousvalued continuoustime stochastic processes widely used in the applied and mathematical sciences. Simulating paths from these processes is usually an intractable problem, and typically involves timediscretization approximations. We propose an exact Markov chain Monte Carlo sampling algorithm that involves no such timediscretization

Improving Bayesian Local Spatial Models in Large Data Sets J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Amanda Lenzi; Stefano Castruccio; Håvard Rue; Marc G. GentonEnvironmental processes resolved at a sufficiently small scale in space and time inevitably display nonstationary behavior. Such processes are both challenging to model and computationally expensive when the data size is large. Instead of modeling the global nonstationarity explicitly, local models can be applied to disjoint regions of the domain. The choice of the size of these regions is dictated

Shrinking the Covariance Matrix using Convex Penalties on the MatrixLog Transformation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Mengxi Yi; David E. TylerFor qdimensional data, penalized versions of the sample covariance matrix are important when the sample size is small or modest relative to q. Since the negative loglikelihood under multivariate normal sampling is convex in Σ−1, the inverse of the covariance matrix, it is common to consider additive penalties which are also convex in Σ−1. More recently, Deng and Tsui (2013) and Yu et al. (2017) have

Quantum Annealing via PathIntegral Monte Carlo with Data Augmentation J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Jianchang Hu; Yazhen WangThis paper considers quantum annealing in the Ising framework for solving combinatorial optimization problems. The pathintegral Monte Carlo simulation approach is often used to approximate quantum annealing and implement the approximation by classical computers, which refers to simulated quantum annealing. In this paper we introduce a data augmentation scheme into simulated quantum annealing and develop

Nonlinear Variable Selection via Deep Neural Networks J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200901
Yao Chen; Qingyi Gao; Faming Liang; Xiao WangThis paper presents a general framework for highdimensional nonlinear variable selection using deep neural networks under the framework of supervised learning. The network architecture includes both a selection layer and approximation layers. The problem can be cast as a sparsityconstrained optimization with a sparse parameter in the selection layer and other parameters in the approximation layers

Reduceddimensional Monte Carlo Maximum Likelihood for Latent Gaussian Random Field Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200824
Jaewoo Park; Murali HaranMonte Carlo maximum likelihood (MCML) provides an elegant approach to find maximum likelihood estimators (MLEs) for latent variable models. However, MCML algorithms are computationally expensive when the latent variables are highdimensional and correlated, as is the case for latent Gaussian random field models. Latent Gaussian random field models are widely used, for example in building flexible regression

Nonstationary modeling with sparsity for spatial data via the basis graphical lasso J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200819
Mitchell Krock; William Kleiber; Stephen BeckerMany modern spatial models express the stochastic variation component as a basis expansion with random coefficients. Low rank models, approximate spectral decompositions, multiresolution representations, stochastic partial differential equations, and empirical orthogonal functions all fall within this basic framework. Given a particular basis, stochastic dependence relies on flexible modeling of the

Dimension reduction for outlier detection using DOBIN J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200818
Sevvandi Kandanaarachchi; Rob J. HyndmanThis paper introduces DOBIN, a new approach to select a set of basis vectors tailored for outlier detection. DOBIN has a simple mathematical foundation and can be used as a dimension reduction tool for outlier detection tasks. We demonstrate the effectiveness of DOBIN on an extensive data repository, by comparing the performance of outlier detection methods using DOBIN and other bases. We further illustrate

Functional regression for densely observed data with novel regularization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
Ruiyan Luo; Xin QiSmoothness penalty is an efficient regularization method in functional data analysis. However, for a spiky coefficient function which may arise when densely observed spiky functional data are involved, the traditional smoothness penalty could be too strong and lead to an oversmoothed estimate. In this paper, we propose a new family of smoothness penalties which are expressed using wavelet coefficients

Fast Search and Estimation of Bayesian Nonparametric Mixture Models Using a Classification Annealing EM Algorithm J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
George KarabatsosBayesian nonparametric (BNP) infinitemixture models provide flexible and accurate density estimation, cluster analysis, and regression. However, for the posterior inference of such a model, MCMC algorithms are complex, often need to be tailormade for different BNP priors, and are intractable for large data sets. We introduce a BNP classification annealing EM algorithm which employs importance sampling

Trace Ratio Optimization for HighDimensional MultiClass Discrimination J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
Jeongyoun Ahn; Hee Cheol Chung; Yongho JeonIn multiclass discrimination with highdimensional data, identifying a lowerdimensional subspace with maximum class separation is crucial. We propose a new optimization criterion for finding such a discriminant subspace, which is the ratio of two traces: the trace of betweenclass scatter matrix and the trace of withinclass scatter matrix. Since this problem is not welldefined for highdimensional

Spectrally Sparse Nonparametric Regression via Elastic Net Regularized Smoothers J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
Nathaniel E. HelwigNonparametric regression frameworks, such as generalized additive models (GAMs) and smoothing spline analysis of variance (SSANOVA) models, extend the generalized linear model (GLM) by allowing for unknown functional relationships between an exponential family response variable and a collection of predictor variables. The unknown functional relationships are typically estimated using penalized likelihood

ModelFree Variable Selection with MatrixValued Predictors J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200814
Zeda Li; Yuexiao DongWe introduce a novel framework for modelfree variable selection with matrixvalued predictors. To test the importance of rows, columns, and submatrices of the predictor matrix in terms of predicting the response, three types of hypotheses are formulated under a unified framework. The asymptotic properties of the test statistics under the null hypothesis are established and a permutation testing algorithm

Anomaly Detection in High Dimensional Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200813
Priyanga Dilini Talagala; Rob J. Hyndman; Kate SmithMilesThe HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in highdimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation where its knearest

Marginallycalibrated deep distributional regression J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200813
Nadja Klein; David J. Nott; Michael Stanley SmithDeep neural network (DNN) regression models are widely used in applications requiring stateoftheart predictive accuracy. However, until recently there has been little work on accurate uncertainty quantification for predictions from such models. We add to this literature by outlining an approach to constructing predictive distributions that are ‘marginally calibrated’. This is where the long run

An efficient algorithm for minimizing multi nonsmooth component functions J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200805
Minh Pham; Anh Ninh; Hoang Le; Yufeng LiuMany problems in statistics and machine learning can be formulated as an optimization problem of a finite sum of nonsmooth convex functions. We propose an algorithm to minimize this type of objective functions based on the idea of alternating linearization. Our algorithm retains the simplicity of contemporary methods without any restrictive assumptions on the smoothness of the loss function. We apply

Model interpretation through lowerdimensional posterior summarization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200721
Spencer Woody; Carlos M. Carvalho; Jared S. MurrayNonparametric regression models have recently surged in their power and popularity, accompanying the trend of increasing dataset size and complexity. While these models have proven their predictive ability in empirical settings, they are often difficult to interpret and do not address the underlying inferential goals of the analyst or decision maker. In this paper, we propose a modular twostage approach

Ustatistical inference for hierarchical clustering J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200720
Marcio Valk; Gabriela Bettella CybisClustering methods are valuable tools for the identification of patterns in high dimensional data with applications in many scientific fields. However, quantifying uncertainty in clustering is a challenging problem, particularly when dealing with High Dimension Low Sample Size (HDLSS) data. We develop a Ustatistics based clustering approach that assesses statistical significance in clustering and

mcvis: A new framework for collinearity discovery, diagnostic and visualization J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200630
Chen Lin; Kevin Wang; Samuel MuellerCollinearity discovery through diagnostic tools is an important analysis step when performing linear regression. Despite their widespread use, collinearity indices such as the variance inflation factor and the condition number have limitations and may not be effective in some applications. In this article we will contribute to the study of conventional collinearity indices through theoretical and

Sparse Single Index Models for Multivariate Responses J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200630
Yuan Feng; Luo Xiao; Eric C. ChiJoint models are popular for analyzing data with multivariate responses. We propose a sparse multivariate single index model, where responses and predictors are linked by unspecified smooth functions and multiple matrix level penalties are employed to select predictors and induce lowrank structures across responses. An alternating direction method of multipliers (ADMM) based algorithm is proposed

Optimal Sampling for Generalized Linear Models under Measurement Constraints J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200608
Tao Zhang; Yang Ning; David RuppertUnder “measurement constraints,” responses are expensive to measure and initially unavailable on most of records in the dataset, but the covariates are available for the entire dataset. Our goal is to sample a relatively small portion of the dataset where the expensive responses will be measured and the resultant sampling estimator is statistically efficient. Measurement constraints require the sampling

Bayesian spatial clustering of extremal behaviour for hydrological variables J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200604
Christian Rohrbeck; Jonathan A. TawnTo address the need for efficient inference for a range of hydrological extreme value problems, spatial pooling of information is the standard approach for marginal tail estimation. We propose the first extreme value spatial clustering methods which account for both the similarity of the marginal tails and the spatial dependence structure of the data to determine the appropriate level of pooling. Spatial

Illumination depth J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200604
Stanislav Nagy; Jiří DvořákThe concept of illumination bodies studied in convex geometry is used to amend the halfspace depth for multivariate data. The proposed notion of illumination enables finer resolution of the sample points, naturally breaks ties in the associated depthbased ordering, and introduces a depthlike function for points outside the convex hull of the support of the probability measure. The illumination is

Surrogate Residuals for Discrete Choice Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200602
Chao Cheng; Rui Wang; Heping ZhangDiscrete Choice Models (DCMs) are a class of models for modelling response variables that take values from a set of alternatives. Examples include logistic regression, probit regression, and multinomial logistic regression. These models are also referred together as generalized linear models. Although there exist methods for the goodness of fit of DCMs, defining intuitive residuals for such models

Delayed acceptance ABCSMC J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200602
Richard G. Everitt; Paulina A. RowińskaApproximate Bayesian computation (ABC) is now an established technique for statistical inference used in cases where the likelihood function is computationally expensive or not available. It relies on the use of a model that is specified in the form of a simulator, and approximates the likelihood at a parameter value θ by simulating auxiliary data sets x and evaluating the distance of x from the true

Correction J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200528
(2020). Correction. Journal of Computational and Graphical Statistics: Vol. 29, No. 3, pp. II.

Efficient Parameter Sampling for Markov Jump Processes J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200507
Boqian Zhang; Vinayak RaoMarkov jump processes are continuoustime stochastic processes widely used in a variety of applied disciplines. Inference typically proceeds via Markov chain Monte Carlo, the stateoftheart being a uniformizationbased auxiliary variable Gibbs sampler. This was designed for situations where the process parameters are known, and Bayesian inference over unknown parameters is typically carried out by

Automated Redistricting Simulation Using Markov Chain Monte Carlo J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200507
Benjamin Fifield; , Michael Higgins; Kosuke Imai; Alexander TarrLegislative redistricting is a critical element of representative democracy. A number of political scientists have used simulation methods to sample redistricting plans under various constraints to assess their impact on partisanship and other aspects of representation. However, while many optimization algorithms have been proposed, surprisingly few simulation methods exist in the published scholarship

Predicting the Output From a Stochastic Computer Model When a Deterministic Approximation is Available J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200507
Evan Baker; Peter Challenor; Matt EamesStatistically modeling the output of a stochastic computer model can be difficult to do accurately without a large simulation budget. We alleviate this problem by exploiting readily available deterministic approximations to efficiently learn about the respective stochastic computer models. This is done via the summation of two Gaussian processes; one responsible for modeling the deterministic approximation

Identifying Heterogeneous Effect using Latent Supervised Clustering with Adaptive Fusion J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200506
Jingxiang Chen; Quoc TranDinh; Michael R. Kosorok; Yufeng LiuPrecision medicine is an important area of research with the goal of identifying the optimal treatment for each individual patient. In the literature, various methods are proposed to divide the population into subgroups according to the heterogeneous effects of individuals. In this paper, a new exploratory machine learning tool, named latent supervised clustering, is proposed to identify the heterogeneous

Massive parallelization boosts big Bayesian multidimensional scaling J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200505
Andrew J. Holbrook; Philippe Lemey; Guy Baele; Simon Dellicour; Dirk Brockmann; Andrew Rambaut; Marc A. SuchardBig Bayes is the computationally intensive coapplication of big data and large, expressive Bayesian models for the analysis of complex phenomena in scientific inference and statistical learning. Standing as an example, Bayesian multidimensional scaling (MDS) can help scientists learn viral trajectories through spacetime, but its computational burden prevents its wider use. Crucial MDS model calculations

Automatic Transformation and Integration to Improve Visualization and Discovery of Latent Effects in Imaging Data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200427
Gregory J. Hunt; Mark A. Dane; James E. Korkola; Laura M. Heiser; Johann A. GagnonBartschProper data transformation is an essential part of analysis. Choosing appropriate transformations for variables can enhance visualization, improve efficacy of analytical methods, and increase data interpretability. However, determining appropriate transformations of variables from highcontent imaging data poses new challenges. Imaging data produce hundreds of covariates from each of thousands of images

Rerandomization strategies for balancing covariates using preexperimental longitudinal data J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20200421
Per Johansson; Mårten SchultzbergABSTRACT This paper considers experimental design based on the strategy of rerandomization to increase the efficiency in experiments. Two aspects of rerandomization are addressed. First, we propose a twostage allocation sample scheme for randomization inference to the units in experiments that guarantees that the differenceinmean estimator is an unbiased estimator of the sample average treatment

A Pliable Lasso J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190905
Robert Tibshirani; Jerome FriedmanWe propose a generalization of the lasso that allows the model coefficients to vary as a function of a general set of some prespecified modifying variables. These modifiers might be variables such as gender, age, or time. The paradigm is quite general, with each lasso coefficient modified by a sparse linear function of the modifying variables Z. The model is estimated in a hierarchical fashion to control

Bivariate Residual Plots With Simulation Polygons J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190906
Rafael A. Moral; John Hinde; Clarice G. B. DemétrioWhen using univariate models, goodness of fit can be assessed through many different methods, including graphical tools such as halfnormal plots with a simulation envelope. This is straightforward due to the notion of ordering of a univariate sample, which can readily reveal possible outliers. In the bivariate case, however, it is often difficult to detect extreme points and verify whether a sample

Estimating TimeVarying Graphical Models J. Comput. Graph. Stat. (IF 2.319) Pub Date : 20190903
Jilei Yang; Jie PengIn this article, we study timevarying graphical models based on data measured over a temporal grid. Such models are motivated by the needs to describe and understand evolving interacting relationships among a set of random variables in many real applications, for instance, the study of how stock prices interact with each other and how such interactions change over time. We propose a new model, LOcal