
样式: 排序: IF: - GO 导出 标记为已读
-
Cross-Validated Loss-Based Covariance Matrix Estimator Selection in High Dimensions J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-08-09 Philippe Boileau, Nima S. Hejazi, Mark J. van der Laan, Sandrine Dudoit
Abstract The covariance matrix plays a fundamental role in many modern exploratory and inferential statistical procedures, including dimensionality reduction, hypothesis testing, and regression. In low-dimensional regimes, where the number of observations far exceeds the number of variables, the optimality of the sample covariance matrix as an estimator of this parameter is well-established. High-dimensional
-
Linear manifold modeling and graph estimation based on multivariate functional data with different coarseness scales J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-08-03 Eugen Pircalabelu, Gerda Claeskens
Abstract We develop a high-dimensional graphical modeling approach for functional data where the number of functions exceeds the available sample size. This is accomplished by proposing a sparse estimator for a concentration matrix when identifying linear manifolds. As such, the procedure extends the ideas of the manifold representation for functional data to high-dimensional settings where the number
-
Mixture of Linear Models Co-supervised by Deep Neural Networks J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-08-01 Beomseok Seo, Lin Lin, Jia Li
Abstract Deep neural networks (DNN) have been demonstrated to achieve unparalleled prediction accuracy in a wide range of applications. Despite its strong performance, in certain areas, the usage of DNN has met resistance because of its black-box nature. In this paper, we propose a new method to estimate a mixture of linear models (MLM) for regression or classification that is relatively easy to interpret
-
Copulas and Histogram-valued Data J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-29 Honghe Jin, L. Billard
Abstract Histogram-valued data are emerging increasingly often as a consequence of the aggregation of large data sets. One statistic that underpins many methodologies especially regression and principal component analyses is the covariance function. To date, no method exists for calculating these functions directly from the marginal histogram observations. This article develops techniques through copula
-
A Bayesian Singular Value Decomposition Procedure for Missing Data Imputation J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-29 Ruoshui Zhai, Roee Gutman
Missing data are common in empirical studies. Multiple imputation is a method to handle missing values by replacing them with plausible values. A common imputation method is multiple imputation wit...
-
Ultra-Fast Approximate Inference Using Variational Functional Mixed Models J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-29 Shuning Huo, Jeffrey S Morris, Hongxiao Zhu
Abstract While Bayesian functional mixed models have been shown effective to model functional data with various complex structures, their application to extremely high-dimensional data is limited due to computational challenges involved in posterior sampling. We introduce a new computational framework that enables ultra-fast approximate inference for high-dimensional data in functional form. This framework
-
A Stochastic Approximation-Langevinized Ensemble Kalman Filter Algorithm for State Space Models with Unknown Parameters J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-29 Tianning Dong, Peiyi Zhang, Faming Liang
Abstract Inference for high-dimensional, large scale and long series dynamic systems is a challenging task in modern data science. The existing algorithms, such as particle filter or sequential importance sampler, do not scale well to the dimension of the system and the sample size of the dataset, and often suffers from the sample degeneracy issue for long series data. The recently proposed Langevinized
-
Estimation of the Spatial Weighting Matrix for Spatiotemporal Data under the Presence of Structural Breaks J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-29 Philipp Otto, Rick Steinert
Abstract In this paper, we propose a two-stage LASSO estimation approach for the estimation of a full spatial weights matrix of spatiotemporal autoregressive models. In addition, we allow for an unknown number of structural breaks in the local means of each spatial location. These locally varying mean levels, however, can easily be mistaken as spatial dependence and vice versa. Thus, the proposed approach
-
Design Principles for Data Analysis J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-25 Lucy D’Agostino McGowan, Roger D. Peng, Stephanie C. Hicks
Abstract The data revolution has led to an increased interest in the practice of data analysis. While much has been written about statistical thinking, a complementary form of thinking that appears in the practice of data analysis is design thinking – the problem-solving process to understand the people for whom a solution is being designed. For a given problem, there can be significant or subtle differences
-
Using CVX to construct optimal designs for biomedical studies with multiple objectives J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-22 Weng Kee Wong, Julie Zhou
ABSTRACT Model-based optimal designs for regression problems with multiple objectives are common in practice. The traditional approach is to construct an optimal design for the most important objective and hope that the design performs well for the other objectives. Analytical approaches are challenging because the objectives are often competitive and their relative importance has to be incorporated
-
Template independent component analysis with spatial priors for accurate subject-level brain network estimation and inference J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-22 Amanda F. Mejia, David Bolin, Yu Ryan Yue, Jiongran Wang, Brian S. Caffo, Mary Beth Nebel
Abstract Independent component analysis is commonly applied to functional magnetic resonance imaging (fMRI) data to extract independent components (ICs) representing functional brain networks. While ICA produces reliable group-level estimates, single-subject ICA often produces noisy results. Template ICA is a hierarchical ICA model using empirical population priors to produce more reliable subject-level
-
Micro-Macro Changepoint Inference for Periodic Data Sequences J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-22 Anastasia Ushakova, Simon A. Taylor, Rebecca Killick
Abstract Existing changepoint approaches consider changepoints to occur linearly in time; one changepoint happens after another and they are not linked. However, data processes may have regularly occurring changepoints, e.g. a yearly increase in sales of ice-cream on the first hot weekend. Using linear changepoint approaches here will miss more global features such as a decrease in sales of ice-cream
-
A Simple Algorithm for Exact Multinomial Tests J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-21 Johannes Resin
Abstract This work proposes a new method for computing acceptance regions of exact multinomial tests. From this an algorithm is derived, which finds exact p-values for tests of simple multinomial hypotheses. Using concepts from discrete convex analysis, the method is proven to be exact for various popular test statistics, including Pearson’s chi-square and the log-likelihood ratio. The proposed algorithm
-
Joint Modeling of Longitudinal Imaging and Survival Data J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-18 Kai Kang, Xinyuan Song
Abstract This paper considers a joint modeling framework for simultaneously examining the dynamic pattern of longitudinal and ultrahigh-dimensional images and their effects on the survival of interest. A functional mixed effects model is considered to describe the trajectories of longitudinal images. Then, a high-dimensional functional principal component analysis (HD-FPCA) is adopted to extract the
-
Variable Screening for Sparse Online Regression J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-13 Jingwei Liang, Clarice Poon
Abstract Sparsity-promoting regularizers are widely used to impose low-complexity structure (e.g. ℓ1-norm for sparsity) to the regression coefficients of supervised learning. In the realm of deterministic optimization, the sequence generated by iterative algorithms (such as proximal gradient descent) exhibit “finite activity identification” property, that is, they can identify the low-complexity structure
-
Multiway sparse distance weighted discrimination J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-13 Bin Guo, Lynn E. Eberly, Pierre-Gilles Henry, Christophe Lenglet, Eric F. Lock
Abstract Modern data often take the form of a multiway array. However, most classification methods are designed for vectors, i.e., 1-way arrays. Distance weighted discrimination (DWD) is a popular high-dimensional classification method that has been extended to the multiway context, with dramatic improvements in performance when data have multiway structure. However, the previous implementation of
-
Triangular Concordance Learning of Networks J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-11 Jiaqi Gu, Guosheng Yin
Abstract Networks are widely used to describe relational data among objects in a complex system. As network data often exhibit clustering structures, research interest often focuses on discovering clusters of nodes. We develop a novel concordance-based method for node clustering in networks, where a linear model is imposed on the latent position of each node with respect to a node-specific center and
-
Fast, Scalable Approximations to Posterior Distributions in Extended Latent Gaussian Models J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-08 Alex Stringer, Patrick Brown, Jamie Stafford
We define a novel class of additive models, called Extended Latent Gaussian Models, that allow for a wide range of response distributions and flexible relationships between the additive predictor a...
-
Exact Bayesian inference for level-set Cox processes with piecewise constant intensity function J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-08 Flávio B. Gonçalves, Bárbara C. C. Dias
Summary This paper proposes a new methodology to perform Bayesian inference for a class of multidimensional Cox processes in which the intensity function is piecewise constant. Poisson processes with piecewise constant intensity functions are believed to be suitable to model a variety of point process phenomena and, given its simpler structure, are expected to provide more precise inference when compared
-
Predictive Subdata Selection for Computer Models J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-07 Ming-Chung Chang
Abstract An explosion in the availability of rich data from the technological advances is hindering efforts at statistical analysis due to constraints on time and memory storage, regardless of whether researchers employ simple methods (e.g., linear regression) or complex models (e.g., Gaussian processes). A recent approach to overcoming these limits involves information-based optimal subdata selection
-
Non-stationary Gaussian process discriminant analysis with variable selection for high-dimensional functional data J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-07 Weichang Yu, Sara Wade, Howard D. Bondell, Lamiae Azizi
Abstract High-dimensional classification and feature selection tasks are ubiquitous with the recent advancement in data acquisition technology. In several application areas such as biology, genomics and proteomics, the data are often functional in their nature and exhibit a degree of roughness and non-stationarity. These structures pose additional challenges to commonly used methods that rely mainly
-
Deep Learning with Functional Inputs J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-07 Barinder Thind, Kevin Multani, Jiguo Cao
Abstract We present a methodology for integrating functional data into deep neural networks. The model is defined for scalar responses with multiple functional and scalar covariates. A by-product of the method is a set of dynamic functional weights that can be visualized during the optimization process. This visualization leads to a greater interpretability of the relationship between the covariates
-
More powerful selective inference for the graph fused lasso J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-07 Yiqun Chen, Sean Jewell, Daniela Witten
Abstract The graph fused lasso — which includes as a special case the one-dimensional fused lasso — is widely used to reconstruct signals that are piecewise constant on a graph, meaning that nodes connected by an edge tend to have identical values. We consider testing for a difference in the means of two connected components estimated using the graph fused lasso. A naive procedure such as a z-test
-
Streamlined Variational Inference for Linear Mixed Models with Crossed Random Effects J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-07 Marianne Menictas, Gioia Di Credico, Matt P. Wand
Abstract We derive streamlined mean field variational Bayes algorithms for fitting linear mixed models with crossed random effects. In the most general situation, where the dimensions of the crossed groups are arbitrarily large, streamlining is hindered by lack of sparseness in the underlying least squares system. Because of this fact we also consider a hierarchy of relaxations of the mean field product
-
A Nearest Neighbor Open-set Classifier based on Excesses of Distance Ratios J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-07 Matthys Lucas Steyn, Tertius de Wet, Bernard De Baets, Stijn Luca
Abstract This paper proposes an open-set recognition model that is based on the use of extreme value statistics. For this purpose, a distance ratio is introduced that expresses how dissimilar a target point is from the known classes by considering the ratio of distances locally around the target point. It is shown that the class of generalized Pareto distributions with bounded support can be used to
-
Mutually exciting point process graphs for modelling dynamic networks J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-07-07 Francesco Sanna Passino, Nicholas A. Heard
Abstract A new class of models for dynamic networks is proposed, called mutually exciting point process graphs (MEG). MEG is a scalable network-wide statistical model for point processes with dyadic marks, which can be used for anomaly detection when assessing the significance of future events, including previously unobserved connections between nodes. The model combines mutually exciting point processes
-
Exact Bayesian inference for discretely observed Markov Jump Processes using finite rate matrices J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-29 Chris Sherlock, Andrew Golightly
Abstract We present new methodologies for Bayesian inference on the rate parameters of a discretely observed continuous-time Markov jump process with a countably infinite statespace. The usual method of choice for inference, particle Markov chain Monte Carlo (particle MCMC), struggles when the observation noise is small. We consider the most challenging regime of exact observations and provide two
-
K-CDFs: a Nonparametric Clustering Algorithm via Cumulative Distribution Function J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-20 Jicai Liu, Jinhong Li, Riquan Zhang
We propose a novel partitioning clustering procedure based on the cumulative distribution function (CDF), called K-CDFs. For univariate data, the K-CDFs represent the cluster centers by empirical C...
-
DeepMoM: Robust Deep Learning With Median-of-Means J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-17 Shih-Ting Huang, Johannes Lederer
Abstract Data used in deep learning is notoriously problematic. For example, data are usually combined from diverse sources, rarely cleaned and vetted thoroughly, and sometimes corrupted on purpose. Intentional corruption that targets the weak spots of algorithms has been studied extensively under the label of “adversarial attacks.” In contrast, the arguably much more common case of corruption that
-
High-dimensional Multi-Task Learning using Multivariate Regression and Generalized Fiducial Inference J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-17 Zhenyu Wei, Thomas C. M. Lee
Abstract Over the past decades, the Multi-Task Learning (MTL) problem has attracted much attention in the artificial intelligence and machine learning communities. However, most published work in this area focuses on point estimation; i.e., estimating model parameters and/or making predictions. This paper studies another important aspect of the MTL problem: uncertainty quantification for model choices
-
Numerical Tolerance for Spectral Decompositions of Random Matrices and Applications to Network Inference J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-16 Avanti Athreya, Zachary Lubberts, Carey E. Priebe, Youngser Park, Minh Tang, Vince Lyzinski, Michael Kane, Bryan W. Lewis
Abstract We precisely quantify the impact of statistical error in the quality of a numerical approximation to a random matrix eigendecomposition, and under mild conditions, we use this to introduce an optimal numerical tolerance for residual error in spectral decompositions of random matrices. We demonstrate that terminating an eigendecomposition algorithm when the numerical error and statistical error
-
Fused-Lasso Regularized Cholesky Factors of Large Nonstationary Covariance Matrices of Replicated Time Series J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-16 Aramayis Dallakyan, Mohsen Pourahmadi
Abstract The smoothness of subdiagonals of the Cholesky factor of large covariance matrices is closely related to the degree of nonstationarity of autoregressive models for time series data. Heuristically, one expects for nearly stationary covariance matrix entries in each subdiagonal of the Cholesky factor of its inverse to be approximately the same in the sense that the sum of the absolute values
-
Comparing two samples through stochastic dominance: a graphical approach J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-06 Etor Arza, Josu Ceberio, Ekhiñe Irurozki, Aritz Pérez
Abstract Non-deterministic measurements are common in real-world scenarios: the performance of a stochastic optimization algorithm or the total reward of a reinforcement learning agent in a chaotic environment are just two examples in which unpredictable outcomes are common. These measures can be modeled as random variables and compared among each other via their expected values or more sophisticated
-
Model Checking for Logistic Models When the Number of Parameters Tends to Infinity1 J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-06 Xinmin Li, Feifei Chen, Hua Liang, David Ruppert
SUMMARY We propose a projection-based test to check logistic regression models when the dimension of the covariate vector may be divergent. The proposed test achieves a reduction in dimension, and the proposed method behaves as if only a single covariate is present. The test is shown to be consistent and can detect root-n local alternatives. We derive the asymptotic distribution of the proposed test
-
An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-06 Jingyi Zhang, Cheng Meng, Jun Yu, Mengrui Zhang, Wenxuan Zhong, Ping Ma
Abstract Subsampling methods aim to select a subsample as a surrogate for the observed sample. Such methods have been used pervasively in large-scale data analytics, active learning, and privacy-preserving analysis in recent decades. Instead of model-based methods, in this paper, we study model-free subsampling methods, which aim to identify a subsample that is not confined by model assumptions. Existing
-
Analysis of professional basketball field goal attempts via a Bayesian matrix clustering approach J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-06-02 Fan Yin, Guanyu Hu, Weining Shen
Abstract We propose a Bayesian nonparametric matrix clustering approach to analyze the latent heterogeneity structure in the shot selection data collected from professional basketball players in the National Basketball Association (NBA). The proposed method adopts a mixture of finite mixtures framework and fully utilizes the spatial information via a mixture of matrix normal distribution representation
-
Popularity Adjusted Block Models are Generalized Random Dot Product Graphs J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-26 John Koo, Minh Tang, Michael W. Trosset
Abstract We connect two random graph models, the Popularity Adjusted Block Model (PABM) and the Generalized Random Dot Product Graph (GRDPG), by demonstrating that the PABM is a special case of the GRDPG in which communities correspond to mutually orthogonal subspaces of latent vectors. This insight allows us to construct new algorithms for community detection and parameter estimation for the PABM
-
Enforcing stationarity through the prior in vector autoregressions J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-23 Sarah E. Heaps
Abstract Stationarity is a very common assumption in time series analysis. A vector autoregressive process is stationary if and only if the roots of its characteristic equation lie outside the unit circle, constraining the autoregressive coefficient matrices to lie in the stationary region. However, the stationary region has a highly complex geometry which impedes specification of a prior distribution
-
Efficient Optimization of Partition Scan Statistics via the Consecutive Partitions Property J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-19 Charles A. Pehlivanian, Daniel B. Neill
Abstract We generalize the spatial and subset scan statistics from the single to the multiple subset case. The two main approaches to defining the log-likelihood ratio statistic in the single subset case – the population-based and expectation-based scan statistics – are considered, leading to risk partitioning and multiple cluster detection scan statistics, respectively. We show that, for distributions
-
Semi-Complete Data Augmentation for Efficient State Space Model Fitting J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-17 Agnieszka Borowska, Ruth King
Abstract We propose a novel efficient model-fitting algorithm for state space models. State space models are an intuitive and flexible class of models, frequently used due to the combination of their natural separation of the different mechanisms acting on the system of interest: the latent underlying system process; and the observation process. This flexibility, however, often comes at the price of
-
Confidence bands for a log-concave density J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-12 Guenther Walther, Alnur Ali, Xinyue Shen, Stephen Boyd
Abstract We present a new approach for inference about a univariate log-concave distribution: Instead of using the method of maximum likelihood, we propose to incorporate the log-concavity constraint in an appropriate nonparametric confidence set for the cdf F. This approach has the advantage that it automatically provides a measure of statistical uncertainty and it thus overcomes a marked limitation
-
Adaptive Handling of Dependence in High-Dimensional Regression Modeling J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-12 Florian Hébert, David Causeur, Mathieu Emily
Abstract Dependence within a high-dimensional profile of explanatory variables affects estimation and prediction performance of regression models. However, the strong belief that dependence should not be ignored, based on our well-proven knowledge of low-dimensional regression modeling, is not necessarily true in high dimension. To investigate this point, we introduce a new class of prediction scores
-
Persistence Flamelets: Topological Invariants for Scale Spaces J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-10 Tullia Padellini, Pierpaolo Brutti
Abstract In recent years there has been noticeable interest in the study of the “shape of data”. Among the many ways a “shape” could be defined, topology is the most general one, as it describes an object in terms of its connectivity structure: connected components (topological features of dimension 0), cycles (features of dimension 1) and so on. There is a growing number of techniques, generally denoted
-
Model Based Screening Embedded Bayesian Variable Selection for Ultra-high Dimensional Settings J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-05 Dongjin Li, Somak Dutta, Vivekananda Roy
Abstract We develop a Bayesian variable selection method, called SVEN, based on a hierarchical Gaussian linear model with priors placed on the regression coefficients as well as on the model space. Sparsity is achieved by using degenerate spike priors on inactive variables, whereas Gaussian slab priors are placed on the coefficients for the important predictors making the posterior probability of a
-
Generalized Connectivity Matrix Response Regression with Applications in Brain Connectivity Studies J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-05 Jingfei Zhang, Will Wei Sun, Lexin Li
Abstract Multiple-subject network data are fast emerging in recent years, where a separate connectivity matrix is measured over a common set of nodes for each individual subject, along with subject covariates information. In this article, we propose a new generalized matrix response regression model, where the observed network is treated as a matrix-valued response and the subject covariates as predictors
-
Scalable Feature Matching Across Large Data Collections J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-05 David Degras
Abstract This paper is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop fast algorithms with time complexity roughly linear in the number n of datasets and space complexity a small fraction of the data size. These remarkable properties hinge
-
A General Method for Deriving Tight Symbolic Bounds on Causal Effects J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-05-02 Michael C. Sachs, Gustav Jonzon, Arvid Sjölander, Erin E. Gabriel
Abstract A causal query will commonly not be identifiable from observed data, in which case no estimator of the query can be contrived without further assumptions or measured variables, regardless of the amount or precision of the measurements of observed variables. However, it may still be possible to derive symbolic bounds on the query in terms of the distribution of observed variables. Bounds, numeric
-
Integrated Depths for Partially Observed Functional Data J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-28 Antonio Elías, Raúl Jiménez, Anna M. Paganoni, Laura M. Sangalli
Integrated Depths for Partially Observed Functional Data. Journal of Computational and Graphical Statistics. Accepted 11 April 2022.
-
Bayesian Kernel Two-Sample Testing J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-28 Qinyi Zhang, Veit Wild, Sarah Filippi, Seth Flaxman, Dino Sejdinovic
Abstract In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modelling the difference
-
A Distance-preserving Matrix Sketch J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-27 Leland Wilkinson, Hengrui Luo
Abstract Visualizing very large matrices involves many formidable problems. Various popular solutions to these problems involve sampling, clustering, projection, or feature selection to reduce the size and complexity of the original task. An important aspect of these methods is how to preserve relative distances between points in the higher-dimensional space after reducing rows and columns to fit in
-
Dimension Reduction Forests: Local Variable Importance using Structured Random Forests J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Joshua Daniel Loyal, Ruoqing Zhu, Yifan Cui, Xin Zhang
Abstract Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such assessments at a local level, motivated by applications in personalized medicine, policy-making, and bioinformatics. We propose a new nonparametric estimator
-
Double-matched matrix decomposition for multi-view data J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Dongbang Yuan, Irina Gaynanova
Abstract We consider the problem of extracting joint and individual signals from multi-view data, that is, data collected from different sources on matched samples. While existing methods for multi-view data decomposition explore single matching of data by samples, we focus on double-matched multi-view data (matched by both samples and source features). Our motivating example is the miRNA data collected
-
Sample-wise Combined Missing Effect Model with Penalization J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Jialu Li, Guan Yu, Qizhai Li, Yufeng Liu
Abstract Modern high-dimensional statistical inference often faces the problem of missing data. In recent decades, many studies have focused on this topic and provided strategies including complete-sample analysis and imputation procedures. However, complete-sample analysis discards information of incomplete samples, while imputation procedures have accumulative errors from each single imputation.
-
Multiple domain and multiple kernel outcome-weighted learning for estimating individualized treatment regimes J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Shanghong Xie, Thaddeus Tarpey, Eva Petkova, R. Todd Ogden
Abstract Individualized treatment rules (ITRs) recommend treatments that are tailored specifically according to each patient’s own characteristics. It can be challenging to estimate optimal ITRs when there are many features, especially when these features have arisen from multiple data domains (e.g., demographics, clinical measurements, neuroimaging modalities). Considering data from complementary
-
Importance Sampling with the Integrated Nested Laplace Approximation J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Martin Outzen Berild, Sara Martino, Virgilio Gómez-Rubio, Håvard Rue
Abstract The integrated nested Laplace approximation (INLA) is a deterministic approach to Bayesian inference on latent Gaussian models (LGMs) and focuses on fast and accurate approximation of posterior marginals for the parameters in the models. Recently, methods have been developed to extend this class of models to those that can be expressed as conditional LGMs by fixing some of the parameters in
-
Bayesian Distance Weighted Discrimination J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Eric F. Lock
Abstract Distance weighted discrimination (DWD) is a linear discrimination method that is particularly well-suited for classification tasks with high-dimensional data. The DWD coefficients minimize an intuitive objective function, which can solved efficiently using state-of-the-art optimization techniques. However, DWD has not yet been cast into a model-based framework for statistical inference. In
-
Search Algorithms and Loss Functions for Bayesian Clustering J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 David B. Dahl, Devin J. Johnson, Peter Müller
Abstract We propose a randomized greedy search algorithm to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. Given the large size and awkward discrete nature of the search space, the minimization of the posterior expected loss is challenging. Our approach is a stochastic search based on a series of greedy optimizations performed in a random order
-
Analytic Permutation Testing for functional data ANOVA J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Adam B Kashlak, Sergii Myroshnychenko, Susanna Spektor
Abstract Analysis of variance is a cornerstone of statistical hypothesis testing. When data lies beyond the assumption of univariate normality, nonparametric methods including rank based statistics and permutation tests are enlisted. The permutation test is a versatile exact nonparametric significance test that requires drastically fewer assumptions than similar parametric tests. The main downfall
-
Efficient Learning of Quadratic Variance Function Directed Acyclic Graphs via Topological Layers J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Wei Zhou, Xin He, Wei Zhong, Junhui Wang
Abstract Directed acyclic graph (DAG) models are widely used to represent casual relationships among random variables in many application domains. This paper studies a special class of non-Gaussian DAG models, where the conditional variance of each node given its parents is a quadratic function of its conditional mean. Such a class of non-Gaussian DAG models are fairly flexible and admit many popular
-
Eigen-Adjusted Functional Principal Component Analysis J. Comput. Graph. Stat. (IF 1.884) Pub Date : 2022-04-25 Ci-Ren Jiang, Eardi Lila, John AD Aston, Jane-Ling Wang
Abstract Functional Principal Component Analysis (FPCA) has become a widely-used dimension reduction tool for functional data analysis. When additional covariates are available, existing FPCA models integrate them either in the mean function or in both the mean function and the covariance function. However, methods of the first kind are not suitable for data that display second-order variation, while