显示样式： 排序： IF:  GO 导出

Normal variance mixtures: Distribution, density and parameter estimation Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210115
Erik Hintz; Marius Hofert; Christiane LemieuxEfficient algorithms for computing the distribution function, (log)density function and for estimating the parameters of multivariate normal variance mixtures are introduced. For the evaluation of the distribution function, randomized quasiMonte Carlo (RQMC) methods are utilized in a way that improves upon existing methods proposed for the special case of normal and t distributions. For evaluating

Partitionbased feature screening for categorical data via RKHS embeddings Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210114
Jun Lu; Lu Lin; WenWu WangThis paper proposes a new screening procedure for the ultrahigh dimensional data with a categorical response. By exploiting the group structure among predictors, a new partitionbased screening approach is developed via the reproducing kernel Hilbert space (RKHS) embeddings in the maximum mean discrepancy framework. Consequently, the new method is able to identify the influential group of predictors

Optimal treatment regimes for competing risk data using doubly robust outcome weighted learning with bilevel variable selection Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210114
Yizeng He; Soyoung Kim; MiOk Kim; Wael Saber; Kwang Woo AhnThe goal of the optimal treatment regime is maximizing treatment benefits via personalized treatment assignments based on the observed patient and treatment characteristics. Parametric regressionbased outcome learning approaches require exploring complex interplay between the outcome and treatment assignments adjusting for the patient and treatment covariates, yet correctly specifying such relationships

Regression analysis of asynchronous longitudinal data with informative observation processes Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210111
Dayu Sun; Hui Zhao; Jianguo SunA great deal of literature has been established for regression analysis of longitudinal data but most of the existing methods assume that covariates can be observed completely or at the same observation times for the response variable, and the observation process is independent of the response variable completely or given covariates. As pointed out by many authors, in practice, one may face the situation

Approximate computation of projection depths Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210109
Rainer Dyckerhoff; Pavlo Mozharovskyi; Stanislav NagyData depth is a concept in multivariate statistics that measures the centrality of a point in a given data cloud in Rd. If the depth of a point can be represented as the minimum of the depths with respect to all onedimensional projections of the data, then the depth satisfies the socalled projection property. Such depths form an important class that includes many of the depths that have been proposed

A new class of stochastic EM algorithms. Escaping local maxima and handling intractable sampling Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210109
Stéphanie Allassonnière; Juliette ChevallierThe expectation–maximization (EM) algorithm is a powerful computational technique for maximum likelihood estimation in incomplete data models. When the expectation step cannot be performed in closed form, a stochastic approximation of EM (SAEM) can be used. The convergence of the SAEM toward critical points of the observed likelihood has been proved and its numerical efficiency has been demonstrated

Community detection via an efficient nonconvex optimization approach based on modularity Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201230
Quan Yuan; Binghui LiuMaximizing modularity is a widely used method for community detection, which is generally solved by approximate or greedy search because of its high complexity. In this paper, we propose a method, named MSM, for modularity maximization, which reformulates the modularity maximization problem as a subset identification problem and maximizes the surrogate of the modularity. The surrogate of the modularity

Sum of Kronecker products representation and its Cholesky factorization for spatial covariance matrices from large grids Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210106
Jian Cao; Marc G. Genton; David E. Keyes; George M. TurkiyyahThe sum of Kronecker products (SKP) representation for spatial covariance matrices from gridded observations and a corresponding adaptivecrossapproximationbased framework for building the Kronecker factors are investigated. The time cost for constructing an ndimensional covariance matrix is O(nk2) and the total memory footprint is O(nk), where k is the number of Kronecker factors. The memory footprint

Principal component analysis using frequency components of multivariate time series Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20210106
Raanju R. SundararajanDimension reduction techniques for multivariate time series decompose the observed series into a few useful independent/orthogonal univariate components. A spectral domain method is developed for multivariate secondorder stationary time series that linearly transforms the observed series into several groups of lowerdimensional multivariate subseries. These multivariate subseries have nonzero spectral

Support vector subset scan for spatial pattern detection Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201213
Dylan Fitzpatrick; Yun Ni; Daniel B. NeillDiscovery of localized and irregularly shaped anomalous patterns in spatial data provides useful context for operational decisions across many policy domains. The support vector subset scan (SVSS) integrates the penalized fast subset scan with a kernel support vector machine classifier to accurately detect spatial clusters without imposing hard constraints on the shape or size of the pattern. The method

Regression analysis of censored data with nonignorable missing covariates and application to Alzheimer Disease Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201227
Mingyue Du; Huiqiong Li; Jianguo SunIn this paper, we discuss regression analysis of censored failure time data when there exist missing covariates and more specifically, we will consider intervalcensored data, a general form of censored data, and the nonignorable missing. Although many methods have been proposed in the literature for censored data with missing covariates, they only apply to limited situations and it does not seem to

Fast Bayesian estimation of spatial count data models Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201213
Prateek Bansal; Rico Krueger; Daniel J. GrahamSpatial count data models are used to explain and predict the frequency of phenomena such as traffic accidents in geographically distinct entities such as census tracts or road segments. These models are typically estimated using Bayesian Markov chain Monte Carlo (MCMC) simulation methods, which, however, are computationally expensive and do not scale well to large datasets. Variational Bayes (VB)

Gaussian Bayesian network comparisons with graph ordering unknown Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201226
Hongmei Zhang; Xianzheng Huang; Shengtong Han; Faisal I. Rezwan; Wilfried Karmaus; Hasan Arshad; John W. HollowayA Bayesian approach is proposed that unifies Gaussian Bayesian network constructions and comparisons between two networks (identical or differential) for data with graph ordering unknown. When sampling graph ordering, to escape from local maximums, an adjusted single queue equienergy algorithm is applied. The conditional posterior probability mass function for network differentiation is derived and

A flexible factor analysis based on the class of meanmixture of normal distributions Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201219
Farzane Hashemi; Mehrdad Naderi; Ahad Jamalizadeh; Andriette BekkerFactor analysis is a statistical technique for data reduction and structure detection that traditionally relies on the normality assumption for factors. However, due to the presence of nonnormal features such as asymmetry and heavy tails in many practical situations, the first two moments cannot adequately explain the factors. An extension of the factor analysis model is introduced by assuming a generalization

Twosample tests for multivariate functional data with applications Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201219
Zhiping Qiu; Jianwei Chen; JinTing ZhangMultivariate functional data are frequently obtained in many scientific or industrial areas where several functions for a statistical unit are observed over time. It is often interesting to check if the mean vector functions of two multivariate functional samples are equal. To address this important issue, two global tests for the above twosample problem for multivariate functional data are proposed

Communicationefficient distributed estimator for generalized linear models with a diverging number of covariates Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201213
Ping Zhou; Zhen Yu; Jingyi Ma; Maozai Tian; Ye FanNowadays, it has become increasingly common to store largescale data sets distributedly across a great number of clients. The aim of the study is to develop a distributed estimator for generalized linear models (GLMs) in the “large n, diverging pn” framework with a weak assumption on the number of clients. When the dimension diverges at the rate of o(n), the asymptotic efficiency of the global maximum

SIMEX estimation in parametric modal regression with measurement error Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201219
Jianhong Shi; Yujing Zhang; Ping Yu; Weixing SongA simulation–extrapolation procedure has been developed for estimating the regression coefficients in a class of parametric modal regression models when the covariates are prone to measurement errors. Large sample properties of the proposed estimator, including the consistency and asymptotic normality, have been thoroughly investigated. Simulation studies and real data applications have been conducted

A mappingbased universal Kriging model for orderofaddition experiments in drug combination studies Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201214
Qian Xiao; Hongquan XuIn modern pharmaceutical studies, treatments may include several drugs added sequentially, and the drugs’ orderofaddition can have significant impacts on their efficacy. In practice, experiments enumerating all possible drug sequences are often not affordable, and appropriate statistical models which can accurately predict all cases using only a small number of experimental trials are required. A

Efficient inference for stochastic differential equation mixedeffects models using correlated particle pseudomarginal algorithms Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201208
Samuel Wiqvist; Andrew Golightly; Ashleigh T. McLean; Umberto PicchiniStochastic differential equation mixedeffects models (SDEMEMs) are flexible hierarchical models that are able to account for random variability inherent in the underlying timedynamics, as well as the variability between experimental units and, optionally, account for measurement error. Fully Bayesian inference for statespace SDEMEMs is performed, using data at discrete times that may be incomplete

Estimation of high dimensional factor model with multiple thresholdtype regime shifts Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201217
Jianhong WuThis paper considers the estimation of high dimensional factor model with multiple thresholdtype regime shifts in factor loadings. Firstly, the number of thresholds is determined by comparing the number of factors in the adjacent subintervals. Secondly, the thresholds are estimated one by one by concentrated least squares, and then the factors and loadings are obtained by the principal component method

Generalized cosparse factor regression Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201103
Aditya Mishra; Dipak K. Dey; Yong Chen; Kun ChenMultivariate regression techniques are commonly applied to explore the associations between large numbers of outcomes and predictors. In realworld applications, the outcomes are often of mixed types, including continuous measurements, binary indicators, and counts, and the observations may also be incomplete. Building upon the recent advances in mixedoutcome modeling and sparse matrix factorization

A novel method of marginalisation using low discrepancy sequences for integrated nested Laplace approximations Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201121
Paul T. Brown; Chaitanya Joshi; Stephen Joe; Håvard RueRecently, it has been shown that the shape of a marginal distribution can be more accurately and efficiently captured using a set of low discrepancy sequence (LDS) points compared to standard grid points. This suggests that the use of LDS could improve the approximation to marginal posterior distributions produced by gridbased Bayesian methods such as the Integrated Nested Laplace Approximation (INLA)

Compromise design for combination experiment of two drugs Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201128
Hengzhen Huang; Xueping ChenPreclinical experiment on twodrug combination is a stepping stone to multidrug combination studies. Experimental designs have been proposed in the literature to test the presence of synergism between the combined drugs. However, a design that is efficient for synergy testing is not necessarily desirable for dose–response modeling and the latter is important for future development on drug interaction

Robustness of costeffectiveness analyses of cluster randomized trials assuming bivariate normality against skewed cost data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201125
Md Abu Manju; Math J.J.M. Candel; Gerard J.P. van BreukelenThe bivariate normal multilevel model (MLM) provides a flexible modeling framework for costeffectiveness analyses (CEAs) alongside cluster randomized trials (CRTs) as well as for sample size calculations of these trials. The bivariate MLM assumes a joint normal distribution for effects and costs, both within (individual level) and between (cluster level) clusters. A typical problem in CEAs is that

Semiparametric quantile regression using family of quantilebased asymmetric densities Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201118
Irène Gijbels; Rezaul Karim; Anneleen VerhasseltQuantile regression is an important tool in data analysis. Linear regression, or more generally, parametric quantile regression imposes often too restrictive assumptions. Nonparametric regression avoids making distributional assumptions, but might have the disadvantage of not exploiting distributional modelling elements that might be brought in. A semiparametric approach towards estimating conditional

Fast inference for semivarying coefficient models via local averaging Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201118
Heng Peng; Chuanlong Xie; Jingxin ZhaoThe semivarying coefficient models are widely used in the application of finance, economics, medical science and many other areas. In general, the functional coefficients are estimated by local smoothing methods, e.g. local linear estimator. So the computation cost is severe because one should pointwisely estimate the value of a coefficient function. In this paper, we give an insight into the tradeoff

Embedding and learning with signatures Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201125
Adeline FermanianSequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. A novel approach for sequential learning, called the signature method and rooted in rough path theory, is considered. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. This approach relies critically

A nonparametric test for comparing conditional ROC curves Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201119
Arís FanjulHevia; Wenceslao GonzálezManteiga; Juan Carlos PardoFernándezComparing the accuracy and the behaviour of different diagnostic procedures is one of the main objectives of the Receiver Operating Characteristic (ROC) curve analysis. Along with the diagnostic variables it is usual to observe other covariates, but that extra information has been hardly ever considered for the comparison of this kind of curves. A new nonparametric test is proposed for the comparison

Kendall regression coefficient Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201109
Eckhard LiebscherA new multivariate extension of Kendall’s dependence coefficient tailored for use in regression analysis is introduced. This coefficient is called Kendall regression coefficient and indicates how well the response variable can be approximated by a strictly increasing function of the regressor (predictor) variables. The properties of this coefficient are examined. In the second part the empirical regression

Dimensionreduced semiparametric estimation of distribution functions and quantiles with nonignorable nonresponse Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201112
Lei Wang; Puying Zhao; Jun ShaoTo estimate distribution functions and quantiles of a response variable when the data having nonignorable nonresponse and the dimension of covariate is not low, this article assumes that the propensity follows a general semiparametric model, but the distribution of the response variable and related covariates is unspecified. To address the identifiability problem, an instrumental covariate, which is

Graphical modelling and partial characteristics for multitype and multivariatemarked spatiotemporal point processes Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201109
Matthias Eckardt; Jonatan A. González; Jorge MateuA method for dealing with multivariate analysis of marked spatiotemporal point processes is presented by introducing different partial point characteristics, and by extending the spatial dependence graph model formalism. The approach yields a unified framework for different types of spatiotemporal data, including both, purely qualitatively (multivariate) cases and multivariate cases with additional

Clustering for timevarying relational count data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201022
Satoshi Goto; Mariko Takagishi; Hiroshi YadohisaRelational count data are often obtained from sources such as simultaneous purchase in online shops and social networking service information. Clustering such relational count data reveals the latent structure of the relationship between objects such as household items or people. When relational count data observed at multiple time points are available, it is worthwhile incorporating the time structure

A nonparametric empirical Bayes approach to largescale multivariate regression Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201109
Yihe Wang; Sihai Dave ZhaoMultivariate regression has many applications, ranging from time series prediction to genomics. Borrowing information across the outcomes can improve prediction error, even when outcomes are statistically independent. Many methods exist to implement this strategy, for example the multiresponse lasso, but choosing the optimal method for a given dataset is difficult. These issues are addressed by establishing

Dimension reduction in binary response regression: A joint modeling approach Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201107
Junlan Li; Tao WangCategorical responses cause no conceptual complications for dimension reduction in regression, but the performance of some methods may suffer in this context and hence supervised dimension reduction in practice must recognize the nature of the response. Using a continuous latent variable to represent an unobserved response underlying the binary response, a joint model is proposed for dimension reduction

Decrement rates and a numerical method under competing risks Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201102
Hangsuck Lee; Hongjun Ha; Taewon LeeModeling the interactions of competing risks that affect the occurrence of various decrements such as death or disease is an essential issue in survival analysis and actuarial science. Popular assumptions for the construction of decrement models are uniform distributions of decrements in a multiple decrement table (mUDD) and associated single decrement tables (sUDD), respectively. Even though there

Analysis of multivariate longitudinal data using ARMA Cholesky and hypersphere decompositions Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201118
Keunbaik Lee; ChangHoon Lee; MinSun Kwak; Eun Jin JangIn longitudinal data with many replications, the highorder autoregressive (AR) structure of covariance matrix is required to capture the serial correlations between repeated outcomes. Thus, the highorder AR structure requires many parameters underlying the dynamic data dependence. In this paper, we proposed an autoregressive movingaverage (ARMA) structure of covariance matrix involving multivariate

Iterative GMM for partially linear singleindex models with partly endogenous regressors Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201118
HongFan ZhangIn this paper, we consider the estimation method for the partially linear singleindex model with endogenous regressors in the linear part. The Generalized Method of Moments (GMM) using instrumental variables is applied to cope with the problem that the parameter estimators may be inconsistent due to endogeneity. The GMM estimation is based on an iterative procedure, which has generalized the well

Causal network learning with noninvertible functional relationships Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201112
Bingling Wang; Qing ZhouDiscovery of causal relationships from observational data is an important problem in many areas. Several recent results have established the identifiability of causal directed acyclic graphs (DAGs) with nonGaussian and/or nonlinear structural equation models (SEMs). Focusing on nonlinear SEMs defined by noninvertible functions, which exist in many data domains, a novel test is proposed for noninvertible

Anglebased costsensitive multicategory classification Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201013
Yi Yang; Yuxuan Guo; Xiangyu ChangMany realworld classification problems come with costs which can vary for different types of misclassification. It is thus important to develop costsensitive classifiers which minimize the total misclassification cost. Although binary costsensitive classifiers have been wellstudied, solving multicategory classification problems is still challenging. A popular approach to address this issue is to

Variable selection for generalized odds rate mixture cure models with intervalcensored failure time data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201026
Yang Xu; Shishun Zhao; Tao Hu; Jianguo SunVariable selection for failure time data with a cured fraction has been discussed by many authors but most of existing methods apply only to rightcensored failure time data. In this paper, we consider variable selection when one faces intervalcensored failure time data arising from a general class of generalized odds rate mixture cure models, and we propose a penalized variable selection method by

Fast stable parameter estimation for linear dynamical systems Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201028
M. Carey; J.O. RamsayDynamical systems describe changes in processes that arise naturally from their underlying physical principles, such as the laws of motion or the conservation of mass, energy or momentum. These models facilitate a causal explanation for the drivers and impediments of the processes. Extracting these governing equations from data is a central challenge in many diverse areas of science and engineering

Density estimation on a network Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201104
Yang Liu; David RuppertA novel approach is proposed for density estimation on a network. Nonparametric density estimation on a network is formulated as a nonparametric regression problem by binning. Nonparametric regression using local polynomial kernelweighted least squares have been studied rigorously, and its asymptotic properties make it superior to kernel estimators such as the Nadaraya–Watson estimator. When applied

Parallel crossvalidation: A scalable fitting method for Gaussian process models Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201022
Florian Gerber; Douglas W. NychkaGaussian process (GP) models are widely used to analyze spatially referenced data and to predict values at locations without observations. They are based on a statistical framework, which enables uncertainty quantification of the model structure and predictions. Both the evaluation of the likelihood and the prediction involve solving linear systems. Hence, the computational costs are large and limit

Statistical inference for interarrival times of extreme events in bursty time series Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20200919
Katharina Hees; Smarak Nayak; Peter StrakaIn many complex systems studied in statistical physics, interarrival times between events such as solar flares, trades and neuron voltages follow a heavytailed distribution. The set of event times is fractallike, being dense in some time windows and empty in others, a phenomenon which has been dubbed “bursty”. A new model for the interexceedance times of such events above high thresholds is proposed

Efficient and robust estimation of regression and scale parameters, with outlier detection Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201014
Alain DesgagnéLinear regression with normally distributed errors – including particular cases such as ANOVA, Student’s ttest or location–scale inference – is a widely used statistical procedure. In this case the ordinary least squares estimator possesses remarkable properties but is very sensitive to outliers. Several robust alternatives have been proposed, but there is still significant room for improvement. An

Variable selection in highdimensional linear model with possibly asymmetric errors Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201014
Gabriela CiupercaIn many application areas, the problem of the automatic variable selection in a linear model with asymmetric errors is encountered, when the number of explanatory variables diverges with the sample size. For this highdimensional model, the penalized least squares method is not appropriate and the quantile framework makes the inference more difficult because of the non differentiability of the loss

Smooth simultaneous confidence band for the error distribution function in nonparametric regression Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201009
Lijie Gu; Suojin Wang; Lijian YangA smooth simultaneous confidence band (SCB) is constructed for the distribution of unobserved errors in a nonparametric regression model based on a plugin kernel distribution estimator. The normalized estimation error process is shown to converge to a Gaussian process. Simulation experiments indicate that the proposed SCB not only strikes an intelligent balance between coverage probability and precision

A Bayesian goodnessoffit test for regression Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201007
Andrés F. Barrientos; Antonio CanaleRegression models are widely used statistical procedures, and the validation of their assumptions plays a crucial role in the data analysis process. Unfortunately, validating assumptions usually depends on the availability of tests tailored to the specific model of interest. A novel Bayesian goodnessoffit hypothesis testing approach is presented for a broad class of regression models the response

Outer power transformations of hierarchical Archimedean copulas: Construction, sampling and estimation Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201005
Jan Górecki; Marius Hofert; Ostap OkhrinOuter power (OP) transformations of Archimedean generators are suggested to increase the modeling flexibility and statistical fitting capabilities of classical Archimedean copulas restricted to a single parameter. For OPtransformed Archimedean copulas, a formula for computing tail dependence coefficients is obtained, as well as two feasible OP Archimedean copula estimators are proposed and their properties

Functional time series model identification and diagnosis by means of auto and partial autocorrelation analysis Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201005
Guillermo Mestre; José Portela; Gregory Rice; Antonio Muñoz San Roque; Estrella AlonsoQuantifying the serial correlation across time lags is a crucial step in the identification and diagnosis of a time series model. Simple and partial autocorrelation functions of the time series are the most widely used tools for this purpose with scalar time series. Nevertheless, there is a lack of an established method for the identification of functional time series (FTS) models. Functional versions

Simplified Rvine based forward regression Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20200928
Kailun Zhu; Dorota Kurowicka; Gabriela F. NaneAn extension of the Dvine based forward regression procedure to a Rvine forward regression is proposed. In this extension any Rvine structure can be taken into account. Moreover, a new heuristic is proposed to determine which Rvine structure is the most appropriate to model the conditional distribution of the response variable given the covariates. It is shown in the simulation that the performance

A beyond multiple robust approach for missing response problem Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201013
Qihua Wang; Miaomiao Su; Ruoyu WangImputation and the inverse probability weighting are two commonly used approaches in missing data analysis. Parametric versions of them are not robust due to model misspecification of some unknown functions. Nonparametric ones are robust but are impractical when the number of covariates is large due to the problem of “curse of dimension”. A beyond multiple robust method is proposed in this paper. This

MM algorithms for distance covariance based sufficient dimension reduction and sufficient variable selection Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20200928
Runxiong Wu; Xin ChenSufficient dimension reduction (SDR) using distance covariance (DCOV) was recently proposed as an approach to dimensionreduction problems. Compared with other SDR methods, it is modelfree without estimating link function and does not require any particular distributions on predictors. However, the DCOVbased SDR method involves optimizing a nonsmooth and nonconvex objective function over the Stiefel

Joint generalized estimating equations for longitudinal binary data Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201008
Youjun Huang; Jianxin PanModeling longitudinal binary data is challenging but common in practice. Existing methods on modeling of binary responses take no account of the fact that the correlation coefficient of binary responses must have an upper bound which is smaller than one. Ignoring this fact can lead to incorrect statistical inferences for longitudinal binary data. A novel method is proposed to model the mean and withinsubject

Linearly preconditioned nonlinear conjugate gradient acceleration of the PXEM algorithm Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20200904
Lin Zhou; Yayong TangThe EM algorithm is a widely applicable algorithm for modal estimation but often criticized for its slow convergence. A new hybrid accelerator named APXEM is proposed for speeding up the convergence of EM algorithm, which is based on both Linearly Preconditioned Nonlinear Conjugate Gradient (PNCG) and PXEM algorithm. The intuitive idea is that, each step of the PXEM algorithm can be viewed approximately

Robust variable selection with exponential squared loss for the spatial autoregressive model Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20200928
Yunquan Song; Xijun Liang; Yanji Zhu; Lu LinSpatial dependent data frequently occur in spatial econometrics and endemiology. In this work, we propose a class of penalized robust regression estimators based on exponential squared loss with independent and identical distributed errors for general spatial autoregressive models. A penalized exponential squared loss with the adaptive lasso penalty is employed for simultaneous model selection and

Linkbased survival additive models under mixed censoring to assess risks of hospitalacquired infections Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20201006
Giampiero Marra; Alessio Farcomeni; Rosalba RadiceThe majority of methods available to model survival data only deal with right censoring. However, there are many applications where left, right and/or interval censoring simultaneously occur. A methodology that is capable of handling all types of censoring as well as flexibly estimating several types of covariate effects is presented. The baseline hazard is modelled through monotonic Psplines. The

A selfcalibrated direct approach to precision matrix estimation and linear discriminant analysis in high dimensions Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20200930
Chi Seng Pun; Matthew Zakharia HadimajaA selfcalibrated direct estimation algorithm based on ℓ1regularized quadratic programming is proposed. The selfcalibration is achieved by an iterative algorithm for finding the regularization parameter simultaneously with the estimation target. The proposed algorithm is free of crossvalidation. Two applications of this algorithm are proposed, namely precision matrix estimation and linear discriminant

Estimation of incident dynamic AUC in practice Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20200918
N. van Geloven; Y. He; A.H. Zwinderman; H. PutterThe incident/dynamic timedependent AUC (Area Under the ROC Curve) is an appealing measure to express the discriminative value of a dynamic survival model over time. However, estimation of this measure is not straightforward. Four recently proposed estimation approaches are studied. In an extensive simulation study, a headtohead comparison between these four estimation methods is made. The estimation

Goodnessoffit test for latent block models Comput. Stat. Data Anal. (IF 1.186) Pub Date : 20200919
Chihiro Watanabe; Taiji SuzukiLatent block models are used for probabilistic biclustering, which is shown to be an effective method for analyzing various relational data sets. However, there has been no statistical test method for determining the row and column cluster numbers of latent block models. Recent studies have constructed statisticaltestbased methods for stochastic block models, which assume that the observed matrix