显示样式： 排序： IF:  GO 导出

A class of Birnbaum–Saunders type kernel density estimators for nonnegative data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210413
Yoshihide KakizawaNonparametric density estimation using a class of deformed skew Birnbaum–Saunder (BS) type kernels is suggested for nonnegative data. A remarkable feature of new skew BS type kernel density estimators lies in its general formulation via asymmetry parameter as well as density generator. Mean integrated squared errors of the proposed estimators are investigated, together with strong consistency and asymptotic

Testing error heterogeneity in censored linear regression Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210331
Caiyun Fan, Wenbin Lu, Yong ZhouIn censored linear regression, a key assumption is that the error is independent of predictors. We develop an omnibus test to check error heterogeneity in censored linear regression. Our approach is based on testing the variance component in a working kernel machine regression model. The limiting null distribution of the proposed test statistic is shown to be a weighted sum of independent chisquared

Communicationefficient distributed Mestimation with missing data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210416
Jianwei Shi, Guoyou Qin, Huichen Zhu, Zhongyi ZhuIn the big data era, practical applications often encounter incomplete data. Current distributed methods, ignoring missingness, may cause inconsistent estimates. Motivated by that, a distributed algorithm is developed for Mestimation with missing data. The proposed algorithm is communicationefficient, where only gradient information is transferred to the central machine. The parameters of interest

A Bayesian semiparametric vector Multiplicative Error Model Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210415
Nicola Donelli, Antonietta Mira, Stefano PelusoInteractions among multiple time series of positive random variables are crucial in diverse financial applications, from spillover effects to volatility interdependence. A popular model in this setting is the vector Multiplicative Error Model (vMEM) which poses a linear iterative structure on the dynamics of the conditional mean, perturbed by a multiplicative innovation term. A main limitation of vMEM

Generalized accelerated hazards mixture cure models with intervalcensored data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210414
Xiaoyu Liu, Liming XiangExisting semiparametric mixture cure models with intervalcensored data often assume a survival model, such as the Cox proportional hazards model, proportional odds model, accelerated failure time model, or their transformations for the susceptible subjects. There are cases in practice that such conventional assumptions may be inappropriate for modeling survival outcomes of susceptible subjects. We

Fast Bayesian inference using Laplace approximations in nonparametric double additive locationscale models with right and intervalcensored data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210414
Philippe LambertPenalized Bsplines are commonly used in additive models to describe smooth changes in a response with quantitative covariates. This is usually done through the conditional mean in the exponential family using generalized additive models with an indirect impact on other conditional moments. Another common strategy is to focus on several loworder conditional moments, leaving the full conditional distribution

Copula Particle Filters Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210403
Carlos E. Rodríguez, Stephen G. WalkerA novel analysis of the state space model is presented. It is shown that by modifying the standard recursive update it is possible to apply a copula model to eliminate a particular integral, which is typically performed using importance sampling. With Bayesian models, copulas have recently been shown to provide predictive densities directly, avoiding integrals altogether. As in every particle filter

Testing the firstorder separability hypothesis for spatiotemporal point patterns Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210407
Mohammad Ghorbani, Nafiseh Vafaei, Jiří Dvořák, Mari MyllymäkiFirstorder separability of a spatiotemporal point process plays a fundamental role in the analysis of spatiotemporal point pattern data. While it is often a convenient assumption that simplifies the analysis greatly, existing nonseparable structures should be accounted for in the model construction. Three different tests are proposed to investigate this hypothesis as a step of preliminary data

Parallel integrative learning for largescale multiresponse regression with incomplete outcomes Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210407
Ruipeng Dong, Daoji Li, Zemin ZhengMultitask learning is increasingly used to investigate the association structure between multiple responses and a single set of predictor variables in many applications. In the era of big data, the coexistence of incomplete outcomes, large number of responses, and high dimensionality in predictors poses unprecedented challenges in estimation, prediction and computation. In this paper, we propose a

A kernelbased measure for conditional mean dependence Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210409
Tingyu Lai, Zhongzhan Zhang, Yafei WangA novel metric, called kernelbased conditional mean dependence (KCMD), is proposed to measure and test the departure from conditional mean independence between a response variable Y and a predictor variable X, based on the reproducing kernel embedding and the HilbertSchmidt norm of a tensor operator. The KCMD has several appealing merits. It equals zero if and only if the conditional mean of Y given

In the pursuit of sparseness: A new rankpreserving penalty for a finite mixture of factor analyzers Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210406
NamHwui Kim, Ryan P. BrowneA finite mixture of factor analyzers is an effective method for achieving parsimony in modelbased clustering. Introducing a penalization term for the factor loading can lead to sparse estimates. However, in the pursuit of sparseness, one can end up with rankdeficient solutions regardless of the number of factors assumed. In light of this issue, a new penaltybased method that can fit a finite mixture

Robust MAVE through nonconvex penalized regression Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210408
Jing Zhang, Qin Wang, D'Arcy MaysHigh dimensionality has been a significant feature in modern statistical modeling. Sufficient dimension reduction (SDR) as an efficient tool aims at reducing the original high dimensional predictors without losing any regression information. Minimum average variance estimation (MAVE) is a popular approach in SDR among others. However, it is not robust to outliers in the response due to the use of least

An ensemble of inverse moment estimators for sufficient dimension reduction Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210408
Qin Wang, Yuan XueSufficient dimension reduction (SDR) is known to be a useful tool in data visualization and information retrieval for high dimensional data. Many wellknown SDR approaches investigate the inverse conditional moments of the predictors given the response. Motivated by the idea of the aggregate dimension reduction, we propose an ensemble of inverse moment estimators to explore the central subspace. The

Composite quantile regression for ultrahigh dimensional semiparametric model averaging Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210330
Chaohui Guo, Jing Lv, Jibo WuTo estimate the joint multivariate regression function, a robust ultrahigh dimensional semiparametric model averaging approach is developed. Specifically, a threestage estimation procedure is proposed. In the first step, the joint multivariate function can be approximated by a weighted average of onedimensional marginal regression functions which can be estimated robustly by the composite quantile

Time stable empirical best predictors under a unitlevel model Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210322
María Guadarrama, Domingo Morales, Isabel MolinaComparability as well as stability over time are highly desirable properties of regularly published statistics, specially when they are related to important issues such as people’s living conditions. For instance, poverty statistics displaying drastic changes from one period to the next for the same area have low credibility. In fact, longitudinal surveys that collect information on the same phenomena

Marginal false discovery rate for a penalized transformation survival model Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210402
Weijuan Liang, Shuangge Ma, Cunjie LinSurvival analysis that involves moderate/high dimensional covariates has become common. Most of the existing analyses have been focused on estimation and variable selection, using penalization and other regularization techniques. To draw more definitive conclusions, a handful of studies have also conducted inference. The recently developed mFDR (marginal false discovery rate) technique provides an

Biascorrected Kullback–Leibler distance criterion based model selection with covariables missing at random Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210331
Yuting Wei, Qihua Wang, Xiaogang Duan, Jing QinA model selection problem for the conditional probability function of the response variable Y given the covariable vector (X,Z) is considered under the case where X is missing at random. And two novel model selection criteria are suggested. It is shown that the model selection by these two criteria is consistent and that the population parameter estimators, corresponding to the selected model, are

Twosample test in high dimensions through random selection Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210316
Tao Qiu, Wangli Xu, Liping ZhuTesting the equality for twosample means with high dimensional distributions is a fundamental problem in statistics. In the past two decades, many efforts have been devoted to comparing the mean vectors of two populations. Many existing tests rely on naive diagonal or trace estimators of the covariance matrix, ignoring the dependence structure between variables. To make more use of the dependence

Robust tests for time series comparison based on Laplace periodograms Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210318
Lei JinStatistical comparison of time series is useful for the detection of mechanical damage and many other realworld applications. New methods have been proposed to check whether two semistationary time series have the same normalized dynamics. The proposed methods differ from traditional methods in that they are based on the Laplace periodogram, which is a robust tool to analyze the serial dependence

Latent association graph inference for binary transaction data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210327
David Reynolds, Luis CarvalhoA novel approach to the problem of statistical inference for multivariate binary transaction data is proposed. A fundamental question that arises from this data, often referred to as market basket data, is how the items relate to one another. These relationships are naturally expressed by a graph and transactions can be modelled as samples of cliques from this association graph. A hierarchical model

Frequentist deltavariance approximations with mixedeffects models and TMB Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210323
Nan Zheng, Noel CadiganMeasures of uncertainty are investigated for estimates and predictions using nonlinear mixedeffects models including state–space models in particular. These nonlinear mixedeffects models include fixed parameters and random effects. Maximum likelihood estimation of the parameters and conditional mean predictors of random effects are commonly used to estimate important quantities for a wide spectrum

Bayes linear analysis for ordinary differential equations Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210324
Matthew Jones, Michael Goldstein, David Randell, Philip JonathanDifferential equation models are used in a wide variety of scientific fields to describe the behaviour of physical systems. Commonly, solutions to given systems of differential equations are not available in closedform; in such situations, the solution to the system is generally approximated numerically. The numerical solution obtained will be systematically different from the (unknown) true solution

Robust distributed modal regression for massive data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210318
Kangning Wang, Shaomin LiModal regression is a good alternative of the mean regression and likelihood based methods, because of its robustness and high efficiency. A robust communicationefficient distributed modal regression for the distributed massive data is proposed in this paper. Specifically, the global modal regression objective function is approximated by a surrogate one at the first machine, which relates to the local

Ensemble sparse estimation of covariance structure for exploring genetic disease data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210315
Xiaoning Kang, Mingqiu WangHighdimensional data often occur nowadays in various areas, such as genetic and microarray data. The covariance matrix is of fundamental importance in analyzing the relationship between multivariate variables. A powerful tool for estimating a covariance matrix is the modified Cholesky decomposition, which allows for unconstrained estimation and guarantees the positive definiteness of the estimate

FunCC: A new biclustering algorithm for functional data with misalignment Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210322
Marta Galvani, Agostino Torti, Alessandra Menafoglio, Simone VantiniThe problem of biclustering functional data, which has recently been addressed in literature, is considered. A definition of ideal functional bicluster is given and a novel biclustering method, called Functional Cheng and Church (FunCC), is developed. The introduced algorithm searches for nonoverlapping and nonexhaustive biclusters in a set of functions which are naturally ordered in matrix structure

Promote sign consistency in the joint estimation of precision matrices Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210304
Qingzhao Zhang, Shuangge Ma, Yuan HuangThe Gaussian graphical model is a popular tool for inferring the relationships among random variables, where the precision matrix provides a natural interpretation of conditional independence. With highdimensional data, sparsity of the precision matrix is often assumed, and various regularization methods have been applied for estimation. In several scenarios, it is desirable to conduct the joint estimation

Tests for differential Gaussian Bayesian networks based on quadratic inference functions Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210308
Xianzheng Huang, Hongmei ZhangHypotheses testing procedures based on quadratic inference functions are proposed to test whether two Gaussian Bayesian networks are differential in structure, strength of associations between nodes, or both. Bootstrap procedures are developed to estimate pvalues to quantify the statistical significance of the tests. Operating characteristics of these testing procedures are investigated using synthetic

Hidden semiMarkovswitching quantile regression for time series Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210305
Antonello Maruotti, Lea Petrella, Luca SpositoA hidden semiMarkovswitching quantile regression model is introduced as an extension of the hidden Markovswitching one. The proposed model allows for arbitrary sojourntime distributions in the states of the Markovswitching chain. Parameters estimation is carried out via maximum likelihood estimation method using the Asymmetric Laplace distribution. As a by product of the model specification, the

Hypothesis testing of varying coefficients for regional quantiles Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210304
Seyoung Park, Eun Ryung LeeTesting the behavior of varying coefficients (VC) over a range of quantiles is important in the field of regression analysis. This study tests whether coefficient functions in varying quantile regression share common structural information across a certain range of quantile levels, even when linear combinations of covariates are unspecified in the null hypothesis. Our approach allows varying the coefficients

Nonparametric density estimation and bandwidth selection with Bspline bases: A novel Galerkin method Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210305
J. Lars Kirkby, Álvaro Leitao, Duy NguyenA general and efficient nonparametric density estimation procedure for local bases, including Bsplines, is proposed, which employs a novel statistical Galerkin method combined with basis duality theory. To select the bandwidth, an efficient crossvalidation procedure is introduced, based on closedform expressions in terms of the primal and dual Bspline basis. By utilizing a closedform expression

Deep distribution regression Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210222
Rui Li, Brian J. Reich, Howard D. BondellDue to their flexibility and predictive performance, machinelearning based regression methods have become an important tool for predictive modeling and forecasting. However, most methods focus on estimating the conditional mean or specific quantiles of the target quantity and do not provide the full conditional distribution, which contains uncertainty information that might be crucial for decision

Censored mean variance sure independence screening for ultrahigh dimensional survival data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210224
Wei Zhong, Jiping Wang, Xiaolin ChenFeature screening has become an indispensable statistical modeling tool for ultrahigh dimensional data analysis. This article introduces a new modelfree marginal feature screening approach for ultrahigh dimensional survival data with right censoring. The new procedure could be used for survival data with both ultrahigh dimensional categorical and continuous covariates. Motivated by Cui et al. (2015)

Subgroup causal effect identification and estimation via matching tree Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210223
Yuyang Zhang, Patrick Schnell, Chi Song, Bin Huang, Bo LuInferring causal effect from observational studies is a central topic in many scientific fields, including social science, health and medicine. The statistical methodology for estimating population average causal effect has been well established. However, the methods for identifying and estimating subpopulation causal effects are relatively less developed. Part of the challenge is that the subgroup

Generalized kmeans in GLMs with applications to the outbreak of COVID19 in the United States Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210310
Tonglin Zhang, Ge LinGeneralized kmeans can be combined with any similarity or dissimilarity measure for clustering. Using the well known likelihood ratio or Fstatistic as the dissimilarity measure, a generalized kmeans method is proposed to group generalized linear models (GLMs) for exponential family distributions. Given the number of clusters k, the proposed method is established by the uniform most powerful unbiased

A new class of stochastic EM algorithms. Escaping local maxima and handling intractable sampling Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210109
Stéphanie Allassonnière, Juliette ChevallierThe expectation–maximization (EM) algorithm is a powerful computational technique for maximum likelihood estimation in incomplete data models. When the expectation step cannot be performed in closed form, a stochastic approximation of EM (SAEM) can be used. The convergence of the SAEM toward critical points of the observed likelihood has been proved and its numerical efficiency has been demonstrated

Tuningfree ridge estimators for highdimensional generalized linear models Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210228
ShihTing Huang, Fang Xie, Johannes LedererRidge estimators regularize the squared Euclidean lengths of parameters. Such estimators are mathematically and computationally attractive but involve tuning parameters that need to be calibrated. It is shown that ridge estimators can be modified such that tuning parameters can be avoided altogether, and the resulting estimator can improve on the prediction accuracies of standard ridge estimators combined

Dissimilarity functions for rankinvariant hierarchical clustering of continuous variables Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210213
Sebastian Fuchs, F. Marta L. Di Lascio, Fabrizio DuranteA theoretical framework is presented for a (copulabased) notion of dissimilarity between continuous random vectors and its main properties are studied. The proposed dissimilarity assigns the smallest value to a pair of random vectors that are comonotonic. Various properties of this dissimilarity are studied, with special attention to those that are prone to the hierarchical agglomerative methods,

Clusterwise functional linear regression models Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210214
Ting Li, Xinyuan Song, Yingying Zhang, Hongtu Zhu, Zhongyi ZhuClassical clusterwise linear regression is a useful method for investigating the relationship between scalar predictors and scalar responses with heterogeneous variation of regression patterns for different subgroups of subjects. This paper extends the classical clusterwise linear regression to incorporate multiple functional predictors by representing the functional coefficients in terms of a functional

Clustering with the Average Silhouette Width Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210210
Fatima Batool, Christian HennigThe Average Silhouette Width (ASW) is a popular cluster validation index to estimate the number of clusters. The question whether it also is suitable as a general objective function to be optimized for finding a clustering is addressed. Two algorithms (the standard version OSil and a fast version FOSil) are proposed, and they are compared with existing clustering methods in an extensive simulation

High dimensional regression for regenerative timeseries: An application to road traffic modeling Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210211
Mohammed Bouchouia, François PortierA statistical predictive model in which a highdimensional timeseries regenerates at the end of each day is used to model road traffic. Due to the regeneration, prediction is based on a daily modeling using a vector autoregressive model that combines linearly the past observations of the day. Due to the highdimension, the learning algorithm follows from an ℓ1penalization of the regression coefficients

Estimating robot strengths with application to selection of alliance members in FIRST robotics competitions Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210212
Alejandro Lim, ChinTsang Chiang, JenChieh TengSince the inception of the FIRST Robotics Competition (FRC) and its special playoff system, robotics teams have longed to appropriately quantify the strengths of their designed robots. The FRC includes a playground draftlike phase (alliance selection), arguably the most gamechanging part of the competition, in which the top8 robotics teams in a tournament based on the FRC’s ranking system assess

Response adaptive designs for Phase II trials with binary endpoint based on contextdependent information measures Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210130
Ksenia Kasianova, Mark Kelbert, Pavel MozgunovIn many rare disease Phase II clinical trials, two objectives are of interest to an investigator: maximising the statistical power and maximising the number of patients responding to the treatment. These two objectives are competing, therefore, clinical trial designs offering a balance between them are needed. Recently, it was argued that responseadaptive designs such as families of multiarm bandit

Explicitduration Hidden Markov Models for quantum state estimation Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210210
Alessandra Luati, Marco NovelliAn explicitduration Hidden Markov Model with a nonparametric kernel estimator of the state duration distribution is specified. The motivation comes from the physical problem of extracting the maximum information from an open quantum system subject to an external perturbation, which induces a change in the dynamics of the system. A nonparametric kernel estimator for discrete data is introduced, which

Robust designs for dose–response studies: Model and labelling robustness Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210205
Douglas P. WiensMethods for the construction of dose–response designs are presented that are robust against possible model misspecifications and mislabelled responses. The asymptotic properties are studied, leading to asymptotically minimax designs that minimize the maximum – over neighbourhoods of both types of model inadequacies – value of the mean squared error of the predictions. Both sequential and adaptive approaches

Confidence intervals for spatial scan statistic Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210212
Ivair R. Silva, Luiz Duczmal, Martin KulldorffThe spatial scan statistic is a popular statistical tool to detect geographical clusters of diseases. The basic problem of constructing confidence intervals for the relative risk of the most likely cluster has remained an open question. To cover this lack, a Monte Carlo based interval estimator for the relative risk of the primary cluster is derived. The method works for the circular spatial scan statistic

Robust variable selection for modelbased learning in presence of adulteration Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210126
Andrea Cappozzo, Francesca Greselin, Thomas Brendan MurphyThe problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection have been proposed in modelbased classification. The impact of outliers and wrongly labeled units on the determination of relevant predictors has instead received far less attention, with almost no dedicated methodologies

Computation of projection regression depth and its induced median Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210127
Yijun ZuoNotions of depth in regression have been introduced and studied in the literature. The most famous example is Regression Depth (RD), which is a direct extension of location depth to regression. The projection regression depth (PRD) is the extension of another prevailing location depth, the projection depth, to regression. The computation issues of the RD have been discussed in the literature. The computation

Optimal treatment regimes for competing risk data using doubly robust outcome weighted learning with bilevel variable selection Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210114
Yizeng He, Soyoung Kim, MiOk Kim, Wael Saber, Kwang Woo AhnThe goal of the optimal treatment regime is maximizing treatment benefits via personalized treatment assignments based on the observed patient and treatment characteristics. Parametric regressionbased outcome learning approaches require exploring complex interplay between the outcome and treatment assignments adjusting for the patient and treatment covariates, yet correctly specifying such relationships

Mixture of linear experts model for censored data: A novel approach with scalemixture of normal distributions Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210124
Elham Mirfarah, Mehrdad Naderi, DingGeng ChenMixture of linear experts (MoE) model is one of the widespread statistical frameworks for modeling, classification, and clustering of data. Built on the normality assumption of the error terms for mathematical and computational convenience, the classical MoE model has two challenges: (1) it is sensitive to atypical observations and outliers, and (2) it might produce misleading inferential results for

Unsupervised image segmentation with Gaussian Pairwise Markov Fields Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210121
Hugo Gangloff, JeanBaptiste Courbot, Emmanuel Monfrini, Christophe ColletModeling strongly correlated random variables is a critical task in the context of latent variable models. A new probabilistic model, called Gaussian Pairwise Markov Field, is presented to generalize existing Markov Fields latent variables models, and to introduce more correlations between variables. This is done by considering the correlations within Gaussian Markov Random Fields models which are

A stochastic block model approach for the analysis of multilevel networks: An application to the sociology of organizations Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210126
SaintClair ChabertLiddell, Pierre Barbillon, Sophie Donnet, Emmanuel LazegaA multilevel network is defined as the junction of two interaction networks, one level representing the interactions between individuals and the other the interactions between organizations. The levels are linked by an affiliation relationship, each individual belonging to a unique organization. A new Stochastic Block Model is proposed as a unified probabilistic framework tailored for multilevel networks

Variable selection in finite mixture of regression models with an unknown number of components Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210126
KuoJung Lee, Martin Feldkircher, YiChi ChenA Bayesian framework for finite mixture models to deal with model selection and the selection of the number of mixture components simultaneously is presented. For that purpose, a feasible reversible jump Markov Chain Monte Carlo algorithm is proposed to model each component as a sparse regression model. This approach is made robust to outliers by using a prior that induces heavy tails and works well

Testing conditional mean through regression model sequence using Yanai’s generalized coefficient of determination Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210119
Masao UekiIn highdimensional data analysis such as in genomics, repeated univariate regression for each variable is utilized to screen useful variables. However, signals jointly detectable with other variables may be overlooked. While the saturated model using all variables may not work in highdimensional data, based on prior knowledge, groupwise analysis for a predefined group is often developed, but the

Approximate computation of projection depths Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210109
Rainer Dyckerhoff, Pavlo Mozharovskyi, Stanislav NagyData depth is a concept in multivariate statistics that measures the centrality of a point in a given data cloud in Rd. If the depth of a point can be represented as the minimum of the depths with respect to all onedimensional projections of the data, then the depth satisfies the socalled projection property. Such depths form an important class that includes many of the depths that have been proposed

Partitionbased feature screening for categorical data via RKHS embeddings Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210114
Jun Lu, Lu Lin, WenWu WangThis paper proposes a new screening procedure for the ultrahigh dimensional data with a categorical response. By exploiting the group structure among predictors, a new partitionbased screening approach is developed via the reproducing kernel Hilbert space (RKHS) embeddings in the maximum mean discrepancy framework. Consequently, the new method is able to identify the influential group of predictors

An exchange algorithm for optimal calibration of items in computerized achievement tests Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210118
Mahmood Ul Hassan, Frank MillerThe importance of large scale achievement tests, like national tests in school, eligibility tests for university, or international assessments for evaluation of students, is increasing. Pretesting of questions for the above mentioned tests is done to determine characteristic properties of the questions by adding them to an ordinary achievement test. If computerized tests are used, it has been shown

Sum of Kronecker products representation and its Cholesky factorization for spatial covariance matrices from large grids Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210106
Jian Cao, Marc G. Genton, David E. Keyes, George M. TurkiyyahThe sum of Kronecker products (SKP) representation for spatial covariance matrices from gridded observations and a corresponding adaptivecrossapproximationbased framework for building the Kronecker factors are investigated. The time cost for constructing an ndimensional covariance matrix is O(nk2) and the total memory footprint is O(nk), where k is the number of Kronecker factors. The memory footprint

Normal variance mixtures: Distribution, density and parameter estimation Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210115
Erik Hintz, Marius Hofert, Christiane LemieuxEfficient algorithms for computing the distribution function, (log)density function and for estimating the parameters of multivariate normal variance mixtures are introduced. For the evaluation of the distribution function, randomized quasiMonte Carlo (RQMC) methods are utilized in a way that improves upon existing methods proposed for the special case of normal and t distributions. For evaluating

Regression analysis of asynchronous longitudinal data with informative observation processes Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210111
Dayu Sun, Hui Zhao, Jianguo SunA great deal of literature has been established for regression analysis of longitudinal data but most of the existing methods assume that covariates can be observed completely or at the same observation times for the response variable, and the observation process is independent of the response variable completely or given covariates. As pointed out by many authors, in practice, one may face the situation

Principal component analysis using frequency components of multivariate time series Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210106
Raanju R. SundararajanDimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. A spectral domain method is developed for multivariate secondorder stationary time series that linearly transforms the observed series into several groups of lowerdimensional multivariate subseries. These multivariate subseries have nonzero spectral