
样式: 排序: IF: - GO 导出 标记为已读
-
Smooth and probabilistic PARAFAC model with auxiliary covariates J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-15 Leying Guan
As immunological and clinical studies become more complex, there is an increasing need to analyze temporal immunophenotypes alongside demographic and clinical covariates, where each subject receive...
-
Gibbs Sampler for Matrix Generalized Inverse Gaussian Distributions J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-15 Yasuyuki Hamura, Kaoru Irie, Shonosuke Sugasawa
Abstract Sampling from matrix generalized inverse Gaussian (MGIG) distributions is required in Markov Chain Monte Carlo (MCMC) algorithms for a variety of statistical models. However, an efficient sampling scheme for the MGIG distributions has not been fully developed. We here propose a novel blocked Gibbs sampler for the MGIG distributions based on the Cholesky decomposition. We show that the full
-
Generalized Variable Selection Algorithms for Gaussian Process Models by LASSO-like Penalty J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-15 Zhiyong Hu, Dipak K. Dey
Abstract With the rapid development of modern technology, massive amounts of data with complex pattern are generated. Gaussian process models that can easily fit the non-linearity in data become more and more popular nowadays. It is often the case that in some data only a few features are important or active. However, unlike classical linear models, it is challenging to identify active variables in
-
Iteratively Reweighted Least Squares Method for Estimating Polyserial and Polychoric Correlation Coefficients J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-14 Peng Zhang, Ben Liu, Jingjing Pan
Abstract An iteratively reweighted least squares (IRLS) method is proposed for the estimation of polyserial and polychoric correlation coefficients in this paper. It calculates the slopes in a series of weighted linear regression models fitting on conditional expected values. For polyserial correlation, conditional expectations of the latent predictor is derived from the observed ordinal categorical
-
Convolutional neural networks for valid and efficient causal inference J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-12 Mohammad Ghasempour, Niloofar Moosavi, Xavier de Luna
Abstract Convolutional neural networks (CNN) have been successful in machine learning applications. Their success relies on their ability to consider space invariant local features. We consider the use of CNN to fit nuisance models in semiparametric estimation of the average causal effect of a treatment. In this setting, nuisance models are functions of pre-treatment covariates that need to be controlled
-
Quasi-Newton Acceleration of EM and MM Algorithms via Broyden’s Method J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-15 Medha Agarwal, Jason Xu
The principle of majorization-minimization (MM) provides a general framework for eliciting effective algorithms to solve optimization problems. However, the resulting methods often suffer from slow...
-
On exact computation of Tukey depth central regions J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-11 Vít Fojtík, Petra Laketa, Pavlo Mozharovskyi, Stanislav Nagy
Abstract The Tukey (or halfspace) depth extends nonparametric methods toward multivariate data. The multivariate analogues of the quantiles are the central regions of the Tukey depth, defined as sets of points in the d-dimensional space whose Tukey depth exceeds given thresholds k. We address the problem of fast and exact computation of those central regions. First, we analyse an efficient Algorithm
-
Clustering sequence data with mixture Markov chains with covariates using multiple simplex constrained optimization routine (MSiCOR) J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-11 Priyam Das, Deborshee Sen, Debsurya De, Jue Hou, Zahra S. H. Abad, Nicole Kim, Zongqi Xia, Tianxi Cai
Abstract Mixture Markov Model (MMM) is a widely used tool to cluster sequences of events coming from a finite state-space. However the MMM likelihood being multi-modal, the challenge remains in its maximization. Although Expectation-Maximization (EM) algorithm remains one of the most popular ways to estimate the MMM parameters, however convergence of EM algorithm is not always guaranteed. Given the
-
Efficient Multidimensional Functional Data Analysis Using Marginal Product Basis Systems J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-11 William Consagra, Arun Venkataraman, Xing Qiu
Abstract In areas ranging from neuroimaging to climate science, advances in data storage and sensor technology have led to a proliferation in multidimensional functional datasets. A common approach to analyzing functional data is to first map the discretely observed functional samples into continuous representations, and then perform downstream statistical analysis on these smooth representations.
-
Improved Pathwise Coordinate Descent for Power Penalties J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-08 Maryclare Griffin
Abstract Pathwise coordinate descent algorithms have been used to compute entire solution paths for lasso and other penalized regression problems quickly with great success. They improve upon cold start algorithms by solving the problems that make up the solution path sequentially for an ordered set of tuning parameter values, instead of solving each problem separastely. However, extending pathwise
-
A New Basis for Sparse Principal Component Analysis J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-08 Fan Chen, Karl Rohe
Previous versions of sparse principal component analysis (PCA) have presumed that the eigen-basis (a p×k matrix) is approximately sparse. We propose a method that presumes the p×k matrix becomes ...
-
EM algorithm for the estimation of the RETAS model* J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-07 Tom Stindl, Feng Chen
Abstract The Renewal Epidemic-Type Aftershock Sequence (RETAS) model is a recently proposed point process model that can fit event sequences such as earthquakes better than pre-existing models. Evaluating the log-likelihood function and directly maximizing it has been shown to be a viable approach to obtain the maximum likelihood estimator (MLE) of the RETAS model. However, the direct likelihood maximization
-
Multiple Imputation Through XGBoost J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-09-01 Yongshi Deng, Thomas Lumley
The use of multiple imputation (MI) is becoming increasingly popular for addressing missing data. Although some conventional MI approaches have been well studied and have shown empirical validity, ...
-
A Simple Divide-and-Conquer-based Distributed Method for the Accelerated Failure Time Model J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-08-31 Lanjue Chen, Jin Su, Alan T.K. Wan, Yong Zhou
Abstract– The accelerated failure time (AFT) model is an appealing tool in survival analysis because of its ease of interpretation, but when there is a large volume of data, fitting an AFT model and carrying out the associated inference on one computer can be computationally demanding. This poses a severe limitation for the application of the AFT model in the face of big data. The present paper addresses
-
General Nonlinear Function-on-Function Regression via Functional Universal Approximation J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-08-29 Ruiyan Luo, Xin Qi
Abstract Various linear or nonlinear function-on-function (FOF) regression models have been proposed to study the relationship between functional variables, where certain forms are assumed for the relationship. However, because functional variables take values in infinite-dimensional spaces, the relationships between them can be much more complicated than those between scalar variables. The forms in
-
Bayesian Multi-task Variable Selection with an Application to Differential DAG Analysis J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-08-28 Guanxun Li, Quan Zhou
Abstract We study the Bayesian multi-task variable selection problem, where the goal is to select activated variables for multiple related data sets simultaneously. We propose a new variational Bayes algorithm which generalizes and improves the recently developed “sum of single effects” model of Wang et al. (2020a). Motivated by differential gene network analysis in biology, we further extend our method
-
Supervised Principal Component Regression for Functional Responses with High Dimensional Predictors J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-08-21 Xinyi Zhang, Qiang Sun, Dehan Kong
Abstract We propose a supervised principal component regression method for relating functional responses with high dimensional predictors. Unlike the conventional principal component analysis, the proposed method builds on a newly defined expected integrated residual sum of squares, which directly makes use of the association between the functional response and the predictors. Minimizing the integrated
-
Change point detection in dynamic networks via regularized tensor decomposition J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-08-04 Yuzhao Zhang, Jingnan Zhang, Yifan Sun, Junhui Wang
Abstract Dynamic network captures time-varying interactions among multiple entities at different time points, and detecting its structural change points is of central interest. This paper proposes a novel method for detecting change points in dynamic networks by fully exploiting the latent network structure. The proposed method builds upon a tensor-based embedding model, which models the time-varying
-
Local Gaussian process extrapolation for BART models with applications to causal inference J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-26 Meijia Wang, Jingyu He, P. Richard Hahn
Abstract Bayesian additive regression trees (BART) is a semi-parametric regression model offering state-of-the-art performance on out-of-sample prediction. Despite this success, standard implementations of BART typically suffer from inaccurate prediction and overly narrow prediction intervals at points outside the range of the training data. This paper proposes a novel extrapolation strategy that grafts
-
Maximum Likelihood Estimation of Hierarchical Linear Models from Incomplete Data: Random Coefficients, Statistical Interactions, and Measurement Error J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-13 Yongyun Shin, Stephen W. Raudenbush
Abstract–We consider two-level models where a continuous response R and continuous covariates C are assumed missing at random. Inferences based on maximum likelihood or Bayes are routinely made by estimating their joint normal distribution from observed data Robs and Cobs . However, if the model for R given C includes random coefficients, interactions, or polynomial terms, their joint distribution
-
Covariance–based rational approximations of fractional SPDEs for computationally efficient Bayesian inference J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-13 David Bolin, Alexandre B. Simas, Zhen Xiong
The stochastic partial differential equation (SPDE) approach is widely used for modeling large spatial datasets. It is based on representing a Gaussian random field u on Rd as the solution of an e...
-
Structured Shrinkage Priors J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-12 Maryclare Griffin, Peter D. Hoff
Abstract In many regression settings the unknown coefficients may have some known structure, for instance they may be ordered in space or correspond to a vectorized matrix or tensor. At the same time, the unknown coefficients may be sparse, with many nearly or exactly equal to zero. However, many commonly used priors and corresponding penalties for coefficients do not encourage simultaneously structured
-
Improving and Extending STERGM Approximations Based on Cross-Sectional Data and Tie Durations J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-07 Chad Klumb, Martina Morris, Steven M. Goodreau, Samuel M., Jenness
Abstract Temporal exponential-family random graph models (TERGMs) are a flexible class of models for network ties that change over time. Separable TERGMs (STERGMs) are a subclass of TERGMs in which the dynamics of tie formation and dissolution can be separated within each discrete time step and may depend on different factors. The Carnegie et al. (2015) approximation improves estimation efficiency
-
Conditional particle filters with bridge backward sampling J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-07 Santeri Karppinen, Sumeetpal S. Singh, Matti Vihola
Abstract Conditional particle filters (CPFs) with backward/ancestor sampling are powerful methods for sampling from the posterior distribution of the latent states of a dynamic model such as a hidden Markov model. However, the performance of these methods deteriorates with models involving weakly informative observations and/or slowly mixing dynamics. Both of these complications arise when sampling
-
Exactly Uncorrelated Sparse Principal Component Analysis J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-07 Oh-Ran Kwon, Zhaosong Lu, Hui Zou
Abstract Sparse principal component analysis (PCA) aims to find principal components as linear combinations of a subset of the original input variables without sacrificing the fidelity of the classical PCA. Most existing sparse PCA methods produce correlated sparse principal components. We argue that many applications of PCA prefer uncorrelated principal components. However, handling sparsity and uncorrelatedness
-
Fast Community Detection in Dynamic and Heterogeneous Networks J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-07 Maoyu Zhang, Jingfei Zhang, Wenlin Dai
Abstract Dynamic heterogeneous networks describe the temporal evolution of interactions among nodes and edges of different types. While there is a rich literature on finding communities in dynamic networks, the application of these methods to dynamic heterogeneous networks can be inappropriate, due to the involvement of different types of nodes and edges and the need to treat them differently. In this
-
Functional Nonlinear Learning J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-07-07 Haixu Wang, Jiguo Cao
Abstract– Using representations of functional data can be more convenient and beneficial in subsequent statistical models than direct observations. These representations, in a lower-dimensional space, extract and compress information from individual curves. The existing representation learning approaches in functional data analysis usually use linear mapping in parallel to those from multivariate analysis
-
A Relaxation Approach to Feature Selection for Linear Mixed Effects Models J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-30 Aleksei Sholokhov, James V. Burke, Damian F. Santomauro, Peng Zheng, Aleksandr Aravkin
Abstract– Linear Mixed-Effects (LME) models are a fundamental tool for modeling correlated data, including cohort studies, longitudinal data analysis, and meta-analysis. Design and analysis of variable selection methods for LMEs is more difficult than for linear regression because LME models are nonlinear. In this work we propose a novel optimization strategy that enables a wide range of variable selection
-
Bayesian heterogeneous hidden Markov models with an unknown number of states J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-30 Yudan Zou, Yiqi Lin, Xinyuan Song
Abstract Hidden Markov models (HMMs) are valuable tools for analyzing longitudinal data due to their capability to describe dynamic heterogeneity. Conventional HMMs typically assume that the number of hidden states (i.e., the order of HMMs) is known or predetermined through criterion-based methods. However, prior knowledge about the order is often unavailable, and a pairwise comparison using criterion-based
-
Accelerated and interpretable oblique random survival forests J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-30 Byron C. Jaeger, Sawyer Welden, Kristin Lenoir, Jaime L. Speiser, Matthew W. Segar, Ambarish Pandey, Nicholas M. Pajewski
Abstract The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles have high prediction accuracy, but assessing many linear combinations of predictors induces high computational overhead. In addition
-
A Scalable Method to Exploit Screening in Gaussian Process Models with Noise J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-30 Christopher J. Geoga, Michael L. Stein
Abstract A common approach to approximating Gaussian log-likelihoods at scale exploits the fact that precision matrices can be well-approximated by sparse matrices in some circumstances. This strategy is motivated by the screening effect, which refers to the phenomenon in which the linear prediction of a process Z at a point x0x0 depends primarily on measurements nearest to x0x0 . But simple perturbations
-
Statistically Valid Variational Bayes Algorithm for Ising Model Parameter Estimation J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-30 Minwoo Kim, Shrijita Bhattacharya, Tapabrata Maiti
Abstract Ising models originated in statistical physics and are widely used in modeling spatial data and computer vision problems. However, statistical inference of this model remains challenging due to intractable nature of the normalizing constant in the likelihood. Here, we use a pseudo-likelihood instead, to study the Bayesian estimation of two-parameter, inverse temperature and magnetization,
-
Approximating Partial Likelihood Estimators via Optimal Subsampling J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-30 Haixiang Zhang, Lulu Zuo, HaiYing Wang, Liuquan Sun
Abstract With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling method to effectively approximate the full data maximum partial likelihood estimator in Cox’s model, which largely reduces the computational burden when
-
Influential Observations in Bayesian Regression Tree Models J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-21 M. T. Pratola, E. I. George, R. E. McCulloch
Abstract Bayesian Classification and Regression Trees (BCART) and Bayesian Additive Regression Trees (BART) are popular Bayesian regression models widely applicable in modern regression problems. Their popularity is intimately tied to the ability to flexibly model complex responses depending on high-dimensional inputs while simultaneously being able to quantify uncertainties. This ability to quantify
-
New and Simplified Manual Controls for Projection and Slice Tours, With Application to Exploring Classification Boundaries in High Dimensions J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-12 Ursula Laa, Alex Aumann, Dianne Cook, German Valencia
Abstract This article describes new user controls for examining high-dimensional data using low-dimensional linear projections and slices. A user can interactively change the contribution of a given variable to a low-dimensional projection, which is useful for exploring the sensitivity of structure to particular variables. The user can also interactively shift the center of a slice, for example, to
-
Improving the Accuracy of Marginal Approximations in Likelihood-Free Inference via Localisation J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-06-08 Christopher Drovandi, David J. Nott, David T. Frazier
Abstract Likelihood-free methods are an essential tool for performing inference for implicit models which can be simulated from, but for which the corresponding likelihood is intractable. However, common likelihood-free methods do not scale well to a large number of model parameters. A promising approach to high-dimensional likelihood-free inference involves estimating low-dimensional marginal posteriors
-
Statistical Significance of Clustering with Multidimensional Scaling J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-05-31 Hui Shen, Shankar Bhamidi, Yufeng Liu
Abstract Clustering is a fundamental tool for exploratory data analysis. One central problem in clustering is deciding if the clusters discovered by clustering methods are reliable as opposed to being artifacts of natural sampling variation. Statistical significance of clustering (SigClust) is a recently developed cluster evaluation tool for high-dimension, low-sample size data. Despite its successful
-
Variable selection and estimation for misclassified binary responses and multivariate error-prone predictors J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-05-26 Li-Pang Chen
Abstract In statistical analysis or supervised learning, classification has been an attractive topic. Typically, a main goal is to adopt predictors to characterize the primarily interested binary random variables. To model a binary response and predictors, parametric structures, such as logistic regression models or probit models, are perhaps commonly used approaches. However, due to the convenience
-
On simulating skewed and cluster-weighted data for studying performance of clustering algorithms J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-05-25 Volodymyr Melnykov, Yang Wang, Yana Melnykov, Francesca Torti, Domenico Perrotta, Marco Riani
Abstract In this paper, extensions to the recently introduced concept of pairwise overlap between mixture components are proposed. The notion of overlap is useful for studying the systematic performance of clustering algorithms. Existing methods can be used for simulating elliptical data according to pre-specified overlap characteristics. First, an approach to simulating skewed clusters with a desired
-
A Unified Approach to Variable Selection for Partially Linear Models J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-05-22 Youhan Lu, Yushen Dong, Juan Hu, Yichao Wu
Abstract We focus on the general partially linear model without any structure assumption on the nonparametric component. For such a model with both linear and nonlinear predictors being multivariate, we propose a new variable selection method. Our new method is a unified approach in the sense that it can select both linear and nonlinear predictors simultaneously by solving a single optimization problem
-
Massive Parallelization of Massive Sample-size Survival Analysis J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-05-12 Jianxiao Yang, Martijn J. Schuemie, Xiang Ji, Marc A. Suchard
Abstract Large-scale observational health databases are increasingly popular for conducting comparative effectiveness and safety studies of medical products. However, increasing number of patients poses computational challenges when fitting survival regression models in such studies. In this paper, we use graphics processing units (GPUs) to parallelize the computational bottlenecks of massive sample-size
-
Concave-Convex PDMP-based sampling J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-05-05 Matthew Sutton, Paul Fearnhead
Abstract Recently non-reversible samplers based on simulating piecewise deterministic Markov processes (PDMPs) have shown potential for efficient sampling in Bayesian inference problems. However, there remains a lack of guidance on how to best implement these algorithms. If implemented poorly, the computational costs of simulating event times can out-weigh the statistical efficiency of the non-reversible
-
Learning Block Structured Graphs in Gaussian Graphical Models J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-05-03 Alessandro Colombi, Raffaele Argiento, Lucia Paci, Alessia Pini
Abstract A prior distribution for the underlying graph is introduced in the framework of Gaussian graphical models. Such a prior distribution induces a block structure in the graph’s adjacency matrix, allowing learning relationships between fixed groups of variables. A novel sampling strategy named Double Reversible Jumps Markov chain Monte Carlo is developed for learning block structured graphs under
-
On the Use of Minimum Penalties in Statistical Learning J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-05-03 Ben Sherwood, Bradley S. Price
Abstract Modern multivariate machine learning and statistical methodologies estimate parameters of interest while leveraging prior knowledge of the association between outcome variables. The methods that do allow for estimation of relationships do so typically through an error covariance matrix in multivariate regression which does not generalize to other types of models. In this article we proposed
-
Quantizing rare random maps: application to flooding visualization J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-25 Charlie Sire, Rodolphe Le Riche, Didier Rullière, Jérémy Rohmer, Lucie Pheulpin, Yann Richet
Abstract Visualization is an essential operation when assessing the risk of rare events such as coastal or river floodings. The goal is to display a few prototype events that best represent the probability law of the observed phenomenon, a task known as quantization. It becomes a challenge when data is expensive to generate and critical events are scarce, like extreme natural hazard. In the case of
-
Local inhomogeneous weighted summary statistics for marked point processes J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-25 Nicoletta D’Angelo, Giada Adelfio, Jorge Mateu, Ottmar Cronie
Abstract We introduce a family of local inhomogeneous mark-weighted summary statistics, of order two and higher, for general marked point processes. Depending on how the involved weight function is specified, these summary statistics capture different kinds of local dependence structures. We first derive some basic properties and show how these new statistical tools can be used to construct most existing
-
Bayesian Model Choice for Directional Data J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-24 Christopher J. Fallaize, Theodore Kypraios
Abstract This paper is concerned with the problem of choosing between competing models for directional data. In particular, we consider the question of whether or not two independent samples of axial data come from the same Bingham distribution. This is not a straightforward question to answer, due to the intractable nature of the parameter-dependent normalising constant of the Bingham distribution
-
Model-based Tensor Low-rank Clustering J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-20 Junge Li, Qing Mai
Abstract Tensors have become prevalent in business applications and scientific studies. It is of great interest to analyze and understand the heterogeneity in tensor-variate observations. We propose a novel tensor low-rank mixture model (TLMM) to conduct efficient estimation and clustering on tensors. The model combines the Tucker low-rank structure in mean contrasts and the separable covariance structure
-
EDI-GRAPHIC: A TOOL TO STUDY PARAMETER DISCRIMINATION AND CONFIRM IDENTIFIABILITY IN BLACK-BOX MODELS, AND TO SELECT DATA-GENERATING MACHINES J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-20 Yannis G. Yatracos
Summary In a Data-Generating Experiment (DGE), the data, X, is often obtained from a Black-Box and is approximated with a learning machine/sampler, f(Y,θ);θ∈Θ, Y is random, f is known. When X has unknown c.d.f., Fθ, non-identifiability of θ cannot be confirmed and may limit the predictive accuracy of the learned model, f(Y,θ̂);θ̂ estimate of θ. Using properties of the Expected P-value for the Kolmogorov-Smirnov
-
Robust Transformations for Multiple Regression via Additivity and Variance Stabilization J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-20 Marco Riani, Anthony C. Atkinson, Aldo Corbellini
Abstract Outliers can have a major effect on the estimated transformation of the response in linear regression models, as they can on the estimates of the coefficients of the fitted model. The effect is more extreme in the Generalized Additive Models (GAMs) that are the subject of this paper, as the forms of terms in the model can also be affected. We develop, describe and illustrate robust methods
-
Statistical Analysis of Fixed Mini-Batch Gradient Descent Estimator J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-18 Haobo Qi, Feifei Wang, Hansheng Wang
Abstract We study here a fixed mini-batch gradient decent (FMGD) algorithm to solve optimization problems with massive datasets. In FMGD, the whole sample is split into multiple non-overlapping partitions. Once the partitions are formed, they are then fixed throughout the rest of the algorithm. For convenience, we refer to the fixed partitions as fixed mini-batches. Then for each computation iteration
-
Comparison and Bayesian Estimation of Feature Allocations J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-17 David B. Dahl, Devin J. Johnson, R. Jacob Andros
Abstract Feature allocation models postulate a sampling distribution whose parameters are derived from shared features. Bayesian models place a prior distribution on the feature allocation, and Markov chain Monte Carlo is typically used for model fitting, which results in thousands of feature allocations sampled from the posterior distribution. Based on these samples, we propose a method to provide
-
A generalization gap estimation for overparameterized models via the Langevin functional variance J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-04-04 Akifumi Okuno, Keisuke Yano
Abstract This paper discusses the estimation of the generalization gap, the difference between generalization performance and training performance, for overparameterized models including neural networks. We first show that a functional variance, a key concept in defining a widely-applicable information criterion, characterizes the generalization gap even in overparameterized settings where a conventional
-
Bootstrap Confidence Regions for Learned Feature Embeddings J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-03-31 Kris Sankaran
Abstract Algorithmic feature learners provide high-dimensional vector representations for non-matrix structured data, like image or text collections. Low-dimensional projections derived from these representations, called embeddings, are often used to explore variation in these data. However, it is not clear how to assess the embedding uncertainty. We adapt methods developed for bootstrapping principal
-
Biconvex Clustering J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-03-31 Saptarshi Chakraborty, Jason Xu
Abstract Convex clustering has recently garnered increasing interest due to its attractive theoretical and computational properties, but its merits become limited in the face of high-dimensional data. In such settings, pairwise affinity terms that rely on k-nearest neighbors become poorly specified and Euclidean measures of fit provide weaker discriminating power. To surmount these issues, we propose
-
A quantum parallel Markov chain Monte Carlo J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-03-31 Andrew J. Holbrook
Abstract We propose a novel hybrid quantum computing strategy for parallel MCMC algorithms that generate multiple proposals at each step. This strategy makes the rate-limiting step within parallel MCMC amenable to quantum parallelization by using the Gumbel-max trick to turn the generalized accept-reject step into a discrete optimization problem. When combined with new insights from the parallel MCMC
-
Penguins Go Parallel: a grammar of graphics framework for generalized parallel coordinate plots J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-03-31 Susan Vander Plas, Yawei Ge, Antony Unwin, Heike Hofmann
Abstract Parallel coordinate plots (PCP) are a valuable tool for exploratory data analysis of high-dimensional numerical data. The use of PCPs is limited when working with categorical variables or a mix of categorical and continuous variables. In this paper, we propose generalized parallel coordinate plots (GPCP) to extend the ability of PCPs from just numeric variables to dealing seamlessly with a
-
Interpretable Architecture Neural Networks for Function Visualization J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-03-29 Shengtong Zhang, Daniel W. Apley
Abstract In many scientific research fields, understanding and visualizing a black-box function in terms of the effects of all the input variables is of great importance. Existing visualization tools do not allow one to visualize the effects of all the input variables simultaneously. Although one can select one or two of the input variables to visualize via a 2D or 3D plot while holding other variables
-
The Apogee to Apogee Path Sampler J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-03-21 Chris Sherlock, Szymon Urbas, Matthew Ludkin
Abstract Amongst Markov chain Monte Carlo algorithms, Hamiltonian Monte Carlo (HMC) is often the algorithm of choice for complex, high-dimensional target distributions; however, its efficiency is notoriously sensitive to the choice of the integration-time tuning parameter. When integrating both forward and backward in time using the same leapfrog integration step as HMC, the set of apogees, local maxima
-
Test and Visualization of Covariance Properties for Multivariate Spatio-Temporal Random Fields J. Comput. Graph. Stat. (IF 2.4) Pub Date : 2023-03-16 Huang Huang, Ying Sun, Marc G. Genton
Abstract The prevalence of multivariate space-time data collected from monitoring networks and satellites, or generated from numerical models, has brought much attention to multivariate spatio-temporal statistical models, where the covariance function plays a key role in modeling, inference, and prediction. For multivariate space-time data, understanding the spatio-temporal variability, within and