样式: 排序: IF: - GO 导出 标记为已读
-
Factor selection in screening experiments by aggregation over random models Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-24 Rakhi Singh, John Stufken
Screening experiments are useful for identifying a small number of truly important factors from a large number of potentially important factors. The Gauss-Dantzig Selector (GDS) is often the preferred analysis method for screening experiments. Just considering main-effects models can result in erroneous conclusions, but including interaction terms, even if restricted to two-factor interactions, increases
-
Inference on order restricted means of inverse Gaussian populations under heteroscedasticity Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-23 Anjana Mondal, Somesh Kumar
The hypothesis testing problem of homogeneity of inverse Gaussian means against ordered alternatives is studied when nuisance or scale-like parameters are unknown and unequal. The maximum likelihood estimators (MLEs) of means and scale-like parameters are obtained when means satisfy some simple order restrictions and scale-like parameters are unknown and unequal. An iterative algorithm is proposed
-
Sequential estimation for mixture of regression models for heterogeneous population Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-23 Na You, Hongsheng Dai, Xueqin Wang, Qingyun Yu
Heterogeneity among patients commonly exists in clinical studies and leads to challenges in medical research. It is widely accepted that there exist various sub-types in the population and they are distinct from each other. The approach of identifying the sub-types and thus tailoring disease prevention and treatment is known as precision medicine. The mixture model is a classical statistical model
-
A stochastic process representation for time warping functions Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-20 Yijia Ma, Xinyu Zhou, Wei Wu
-
Block-wise primal-dual algorithms for large-scale doubly penalized ANOVA modeling Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-15 Penghui Fu, Zhiqiang Tan
For multivariate nonparametric regression, doubly penalized ANOVA modeling (DPAM) has recently been proposed, using hierarchical total variations (HTVs) and empirical norms as penalties on the component functions such as main effects and multi-way interactions in a functional ANOVA decomposition of the underlying regression function. The two penalties play complementary roles: the HTV penalty promotes
-
-
Flexible regularized estimation in high-dimensional mixed membership models Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-09 Nicholas Marco, Damla Şentürk, Shafali Jeste, Charlotte C. DiStefano, Abigail Dickinson, Donatello Telesca
Mixed membership models are an extension of finite mixture models, where each observation can partially belong to more than one mixture component. A probabilistic framework for mixed membership models of high-dimensional continuous data is proposed with a focus on scalability and interpretability. The novel probabilistic representation of mixed membership is based on convex combinations of dependent
-
Parameter Estimation and Random Number Generation for Student Lévy Processes Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-08 Li Shuaiyu, Wu Yunpei, Cheng Yuzhong
To address the challenges in estimating parameters of the widely applied Student-Lévy process, the study introduces two distinct methods: a likelihood-based approach and a data-driven approach. A two-step quasi-likelihood-based method is initially proposed, countering the non-closed nature of the Student-Lévy process's distribution function under convolution. This method utilizes the limiting properties
-
Heterogeneous quantile regression for longitudinal data with subgroup structures Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-29 Zhaohan Hou, Lei Wang
Subgroup analysis for modeling longitudinal data with heterogeneity across all individuals has drawn attention in the modern statistical learning. In this paper, we focus on heterogeneous quantile regression model and propose to achieve variable selection, heterogeneous subgrouping and parameter estimation simultaneously, by using the smoothed generalized estimating equations in conjunction with the
-
Goodness-of-fit test for point processes first-order intensity Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-02 M.I. Borrajo, W. González-Manteiga, M.D. Martínez-Miranda
Modelling the first-order intensity function is one of the main aims in point process theory. An appropriate model describes the first-order intensity as a nonparametric function of spatial covariates. A formal testing procedure is presented to assess the goodness-of-fit of this model, assuming an inhomogeneous Poisson point process. The test is based on a quadratic distance between two kernel intensity
-
Robust heavy-tailed versions of generalized linear models with applications in actuarial science Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-02-02 Philippe Gagnon, Yuxi Wang
Generalized linear models (GLMs) form one of the most popular classes of models in statistics. The gamma variant is used, for instance, in actuarial science for the modelling of claim amounts in insurance. A flaw of GLMs is that they are not robust against outliers (i.e., against erroneous or extreme data points). A difference in trends in the bulk of the data and the outliers thus yields skewed inference
-
A unified framework of analyzing missing data and variable selection using regularized likelihood Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-26 Yuan Bian, Grace Y. Yi, Wenqing He
Missing data arise commonly in applications, and research on this topic has received extensive attention in the past few decades. Various inference methods have been developed under different missing data mechanisms, including missing at random and missing not at random. The assessment of a feasible missing data mechanism is, however, difficult due to the lack of validation data. The problem is further
-
A simple approach for local and global variable importance in nonlinear regression models Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-22 Emily T. Winn-Nuñez, Maryclare Griffin, Lorin Crawford
The ability to interpret machine learning models has become increasingly important as their usage in data science continues to rise. Most current interpretability methods are optimized to work on either (i) a global scale, where the goal is to rank features based on their contributions to overall variation in an observed population, or (ii) the local level, which aims to detail on how important a feature
-
Post-clustering difference testing: Valid inference and practical considerations with applications to ecological and biological data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-17 Benjamin Hivert, Denis Agniel, Rodolphe Thiébaut, Boris P. Hejblum
Clustering is part of unsupervised analysis methods that group samples into homogeneous and separate subgroups of observations also called clusters. To interpret the clusters, statistical hypothesis testing is often used to infer the variables that significantly separate the estimated clusters from each other. However, data-driven hypotheses are thus used for the inference process because the hypotheses
-
Oracle-efficient estimation and trend inference in non-stationary time series with trend and heteroscedastic ARMA error Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-09 Chen Zhong
The non-stationary time series often contain an unknown trend and unobserved error terms. The error terms in the proposed model consist of a smooth variance function and the latent stationary ARMA series, which allows heteroscedasticity at different time points. The theoretically justified two-step B-spline estimation method is proposed for the trend and variance function in the model, and then residuals
-
Integrated subgroup identification from multi-source data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-11 Lihui Shao, Jiaqi Wu, Weiping Zhang, Yu Chen
Subgroup identification is crucial in dealing with the heterogeneous population and has wide applications in various areas, such as clinical trials and market segmentation. With the prevalence of multi-source data, there is a practical need to identify subgroups based on multi-source data. This paper proposes a working-independence pseudo-loglikelihood and integrates the parameters of each source into
-
Change point detection via feedforward neural networks with theoretical guarantees Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-09 Houlin Zhou, Hanbing Zhu, Xuejun Wang
This article mainly studies change point detection for mean shift change point model. An estimation method is proposed to estimate the change point via feedforward neural networks. The complete f-moment consistency of the proposed estimator is obtained. Numerical simulation results show that the performance of the proposed estimator is better than that of cumulative sum type estimator which is widely
-
-
Generalized latent space model for one-mode networks with awareness of two-mode networks Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-10 Xinyan Fan, Kuangnan Fang, Dan Pu, Ruixuan Qin
Latent space models have been widely studied for one-mode networks, in which the same type of nodes connect with each other. In many applications, one-mode networks are often observed along with two-mode networks, which reflect connections between different types of nodes and provide important information for understanding the one-mode network structure. However, the classical one-mode latent space
-
Empirical likelihood in a partially linear single-index model with censored response data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2024-01-10 Liugen Xue
An empirical likelihood (EL) approach for a partial linear single-index model with censored response data is studied. A bias-corrected EL ratio is proposed, and the asymptotic chi-squared distribution of this ratio is obtained. The result can be directly used to construct the confidence regions of the regression parameters. The estimators of regression parameters and link function are constructed,
-
HiQR: An efficient algorithm for high-dimensional quadratic regression with penalties Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-28 Cheng Wang, Haozhe Chen, Binyan Jiang
This paper investigates the efficient solution of penalized quadratic regressions in high-dimensional settings. A novel and efficient algorithm for ridge-penalized quadratic regression is proposed, leveraging the matrix structures of the regression with interactions. Additionally, an alternating direction method of multipliers (ADMM) framework is developed for penalized quadratic regression with general
-
Group variable selection via group sparse neural network Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-29 Xin Zhang, Junlong Zhao
Group variable selection is an important issue in high-dimensional data modeling and most of existing methods consider only the linear model. Therefore, a new method based on the deep neural network (DNN), an increasingly popular nonlinear method in both statistics and deep learning communities, is proposed. The method is applicable to general nonlinear models, including the linear model as a special
-
Subgroup detection based on partially linear additive individualized model with missing data in response Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-20 Tingting Cai, Jianbo Li, Qin Zhou, Songlou Yin, Riquan Zhang
Based on partially linear additive individualized model, a fusion-penalized inverse probability weighted least squares method is proposed to detect the subgroup for missing data in response. Firstly, the B-spline technique is used to approximate the unknown additive individualized functions and then an inverse probability weighted quadratic loss function is established with fusion penalty on the difference
-
A Laplace-based model with flexible tail behavior Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-12 Cristina Tortora, Brian C. Franczak, Luca Bagnato, Antonio Punzo
The proposed multiple scaled contaminated asymmetric Laplace (MSCAL) distribution is an extension of the multivariate asymmetric Laplace distribution to allow for a different excess kurtosis on each dimension and for more flexible shapes of the hyper-contours. These peculiarities are obtained by working on the principal component (PC) space. The structure of the MSCAL distribution has the further advantage
-
Simultaneous inference and uniform test for eigensystems of functional data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-10 Leheng Cai, Qirui Hu
The asymptotically correct confidence interval (CI) and simultaneous confidence band (SCB) of any individual eigenvalue and eigenfunction are constructed under dense functional data through B-spline smoothing. Besides, uniform inference procedures for eigensystems with a diverging number of components are novelly developed. The proposed estimators for functional eigensystems employ “oracle” efficiency
-
Discrepancy between structured matrices in the power analysis of a separability test Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-13 Katarzyna Filipiak, Daniel Klein, Monika Mokrzycka
An important task in the analysis of multivariate data is testing of the covariance matrix structure. In particular, for assessing separability, various tests have been proposed. However, the development of a method of measuring discrepancy between two covariance matrix structures, in relation to the study of the power of the test, remains an open problem. Therefore, a discrepancy measure is proposed
-
Graph-based spatial segmentation of areal data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-12 Vivien Goepp, Jan van de Kassteele
Smoothing is often used to improve the readability and interpretability of noisy areal data. However, there are many instances where the underlying quantity is discontinuous. For such cases, specific methods are needed to estimate the piecewise constant spatial process. A well-known approach in this setting is to perform segmentation of the signal using the adjacency graph, such as the graph-based
-
Estimation of l0 norm penalized models: A statistical treatment Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-06 Yuan Yang, Christopher S. McMahan, Yu-Bo Wang, Yuyuan Ouyang
Fitting penalized models for the purpose of merging the estimation and model selection problem has become commonplace in statistical practice. Of the various regularization strategies that can be leveraged to this end, the use of the l0 norm to penalize parameter estimation poses the most daunting model fitting task. In fact, this particular strategy requires an end user to solve a non-convex NP-hard
-
Hierarchical false discovery rate control for high-dimensional survival analysis with interactions Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-05 Weijuan Liang, Qingzhao Zhang, Shuangge Ma
With the development of data collection techniques, analysis with a survival response and high-dimensional covariates has become routine. Here we consider an interaction model, which includes a set of low-dimensional covariates, a set of high-dimensional covariates, and their interactions. This model has been motivated by gene-environment (G-E) interaction analysis, where the E variables have a low
-
Two-sample test of stochastic block models Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-07 Qianyong Wu, Jiang Hu
In this paper, we consider the problem of two-sample test of large networks with community structures. A test statistic is proposed based on the maximum entry of the difference between the two adjacency matrices. Asymptotic null distribution is derived, and the asymptotic power guarantee against the alternative hypothesis is provided. The simulations and real data examples show that the proposed test
-
Multi-block alternating direction method of multipliers for ultrahigh dimensional quantile fused regression Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-28 Xiaofei Wu, Hao Ming, Zhimin Zhang, Zhenyu Cui
In this paper, we consider a quantile fused LASSO regression model that combines quantile regression loss with the fused LASSO penalty. Intuitively, this model offers robustness to outliers, thanks to the quantile regression, while also effectively recovering sparse and block coefficients through the fused LASSO penalty. To adapt our proposed method for ultrahigh dimensional datasets, we introduce
-
Robust and adaptive functional logistic regression Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-12-01 Ioannis Kalogridis
A novel family of robust estimators for the functional logistic regression model is introduced and studied. The estimators are based on the concept of density power divergence between densities and may be formed with any combination of lower rank approximations and penalties, as the need arises. Uniform convergence and high rates of convergence with respect to the commonly used prediction error are
-
A fast trans-lasso algorithm with penalized weighted score function Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-25 Xianqiu Fan, Jun Cheng, Hailing Wang, Bin Zhang, Zhenzhen Chen
An efficient transfer learning algorithm for high-dimensional sparse logistic regression models is proposed using penalized weighted score function based on square root Lasso, which intends to prespecify the tuning parameter. Three different choices of the tuning parameter are considered in the case of fixed design matrix. With a novel weight construction, the estimator of the regression vector is
-
Efficient and robust optimal design for quantile regression based on linear programming Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-20 Cheng Peng, Drew P. Kouri, Stan Uryasev
When informing decisions with experimental data, it is often necessary to quantify the distribution tails of uncertain system responses using limited data. To maximize the information content of the data, one is naturally led to use experimental design. However, common design techniques minimize global statistics such as the average estimation or prediction variance. Novel methods for optimal experimental
-
Simultaneous confidence region of an embedded one-dimensional curve in multi-dimensional space Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-23 Hiroya Yamazoe, Kanta Naito
This paper focuses on the simultaneous confidence region of a one-dimensional curve embedded in multi-dimensional space. Local linear regression is applied component-wise to each variable in multi-dimensional data, which yields an estimator of the one-dimensional curve. A simultaneous confidence region of the curve is proposed based on this estimator and theoretical results for the estimator and the
-
Clustering-based inter-regional correlation estimation Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-10 Hanâ Lbath, Alexander Petersen, Wendy Meiring, Sophie Achard
A novel non-parametric estimator of the correlation between grouped measurements of a quantity is proposed in the presence of noise. The main motivation is functional brain network construction from fMRI data, where brain regions correspond to groups of spatial units, and correlation between region pairs defines the network. The challenge resides in the fact that both noise and intra-regional correlation
-
Conditional-mean multiplicative operator models for count time series Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-07 Christian H. Weiß, Fukang Zhu
Multiplicative error models (MEMs) are commonly used for real-valued time series, but they cannot be applied to discrete-valued count time series as the involved multiplication would not preserve the integer nature of the data. Thus, the concept of a multiplicative operator for counts is proposed (as well as several specific instances thereof), which are then used to develop a kind of MEMs for count
-
One point per cluster spatially balanced sampling Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-07 Blair Robertson, Chris Price
A spatial sampling design determines where sample locations are placed in a study area so that population parameters can be estimated with relatively high precision. Spatially balanced designs have good spatial spread and give precise results for commonly used estimators when surveying natural resources. A new design is proposed which draws spatially balanced samples from stratified and unstratified
-
Variable selection for high-dimensional incomplete data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-10 Lixing Liang, Yipeng Zhuang, Philip L.H. Yu
Regression analysis is often affected by high dimensionality, severe multicollinearity, and a large proportion of missing data. These problems may mask important relationships and even lead to biased conclusions. This paper proposes a novel computationally efficient method that integrates data imputation and variable selection to address these issues. More specifically, the proposed method incorporates
-
Nearest neighbors weighted composite likelihood based on pairs for (non-)Gaussian massive spatial data with an application to Tukey-hh random fields estimation Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-10 Christian Caamaño-Carrillo, Moreno Bevilacqua, Cristian López, Víctor Morales-Oñate
A highly scalable method for (non-)Gaussian random fields estimation is proposed. In particular, a novel (a) symmetric weight function based on nearest neighbors for the method of maximum weighted composite likelihood based on pairs (WCLP) is studied. The new weight function allows estimating massive (up to millions) spatial datasets and improves the statistical efficiency of the WCLP method using
-
Nonparametric augmented probability weighting with sparsity Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-10 Xin He, Xiaojun Mao, Zhonglei Wang
Nonresponse frequently arises in practice, and simply ignoring it may lead to erroneous inference. Besides, the number of collected covariates may increase as the sample size in modern statistics, so parametric imputation or propensity score weighting usually leads to estimation inefficiency and introduces a large variability without consideration of sparsity. In this paper, we propose a nonparametric
-
Burn-in selection in simulating stationary time series Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-10 Yuanbo Li, Chu Kin Chan, Chun Yip Yau, Wai Leong Ng, Henry Lam
Many time series models are defined in a recursive manner, which prohibits exact simulations. In practice, one appeals to simulating a long time series and discarding a large number of initial simulated observations, known as the burn-in. For autoregressive models where the dependence decays exponentially fast, the choice of the burn-in is not critical. However, for long-memory time series where the
-
Bayesian Boundary Trend Filtering Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-11-07 Takahiro Onizuka, Fumiya Iwashige, Shintaro Hashimoto
Estimating boundary curves has many applications such as economics, climate science, and medicine. Bayesian trend filtering has been developed as one of locally adaptive smoothing methods to estimate the non-stationary trend of data. This paper develops a Bayesian trend filtering for estimating boundary trend. To this end, the truncated multivariate normal working likelihood and global-local shrinkage
-
Bayesian nonparametric Erlang mixture modeling for survival analysis Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-10-27 Yunzhe Li, Juhee Lee, Athanasios Kottas
Development of a flexible Erlang mixture model for survival analysis is introduced. The model for the survival density is built from a structured mixture of Erlang densities, mixing on the integer shape parameter with a common scale parameter. The mixture weights are constructed through increments of a distribution function on the positive real line, which is assigned a Dirichlet process prior. The
-
Integrating machine learning and Bayesian nonparametrics for flexible modeling of point pattern data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-10-17 Matthew J. Heaton, Benjamin K. Dahl, Caleb Dayley, Richard L. Warr, Philip White
Two common approaches to analyze point pattern (location-only) data are mixture models and log-Gaussian Cox processes. The former provides a flexible model for the intensity surface at the expense of no covariate effect estimates while the latter estimates covariate effects at the expense of computation. A bridge is built between these two methods that leverages the strengths of both approaches. Namely
-
Nonparametric quantile scalar-on-image regression Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-10-18 Chuchu Wang, Xinyuan Song
A quantile scalar-on-image regression model is developed to comprehensively study the relationship between cognitive decline and various clinical covariates and imaging factors. As a motivating example, the high-dimensional brain imaging data from the research on Alzheimer's disease are considered predictors of patients' cognitive decline. A Bayesian nonparametric model is proposed to handle the complex
-
Distributed debiased estimation of high-dimensional partially linear models with jumps Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-10-12 Yan-Yong Zhao, Yuchun Zhang, Yuan Liu, Noriszura Ismail
In this paper, we focus on the estimations of both parameter vector and nonparametric component in a high-dimensional partially linear model with jumps within the framework of divide and conquer strategy. We find that a three-stage estimation procedure works well in this setting. Applying the lasso penalty and projected spline approximation, first a profiled estimator for the linear part and a projected
-
On the efficacy of higher-order spectral clustering under weighted stochastic block models Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-10-10 Xiao Guo, Hai Zhang, Xiangyu Chang
Higher-order structures of networks, namely, small subgraphs of networks (also called network motifs), are widely known to be crucial and essential to the organization of networks. Several works have studied the community detection problem–a fundamental problem in network analysis at the level of motifs. In particular, the higher-order spectral clustering has been developed, where the notion of motif
-
Fast same-step forecast in SUTSE model and its theoretical properties Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-10-05 Wataru Yoshida, Kei Hirose
The problem of forecasting multivariate time series by a Seemingly Unrelated Time Series Equations (SUTSE) model is considered. The SUTSE model usually assumes that error variables are correlated. A crucial issue is that the model estimation requires heavy computational loads because of a large matrix computation, especially for high-dimensional data. To alleviate the computational issue, a two-stage
-
Calibrated regression estimation using empirical likelihood under data fusion Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-10-06 Wei Li, Shanshan Luo, Wangli Xu
Data analysis based on information from different sources, typically known as the data fusion problem, is common in economic and biomedical studies. An interesting question concerns the regression of an outcome variable on certain covariates when combining two distinct datasets. These datasets consist of a primary sample containing the outcome and a subset of the covariates, and a supplemental sample
-
Additive partially linear model for pooled biomonitoring data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-10-02 Xichen Mou, Dewei Wang
Human biomonitoring involves monitoring human health by measuring the accumulation of harmful chemicals, typically in specimens like blood samples. The high cost of chemical analysis has led researchers to adopt a cost-effective approach. This approach physically combines specimens and subsequently analyzes the concentration of toxic substances within the merged pools. Consequently, there arises a
-
Detecting change structures of nonparametric regressions Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-27 Wenbiao Zhao, Lixing Zhu
This research investigates detecting change points of general nonparametric regression functions by introducing a novel criterion. It is based on the moving sums of conditional expectation to avoid both computationally expensive algorithms, exhaustive search methods need, and false positives hypothesis testing-based approaches encounter. This new criterion can simultaneously and consistently, in a
-
Variables selection using L0 penalty Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-26 Tonglin Zhang
The determination of a tuning parameter by the generalized information criterion (GIC) is considered an important issue in variable selection. It is shown that the GIC and the L0 penalized objective functions are equivalent, leading to a new L0 penalized maximum likelihood method for high-dimensional generalized linear models in this article. Based on the technique of the well-known discrete optimization
-
Laplace approximated quasi-likelihood method for heteroscedastic survival data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-26 Lili Yu, Yichuan Zhao
The classical accelerated failure time model is the major linear model for right censored survival data. It requires the survival data to exhibit homoscedasticity of variance and excludes heteroscedastic survival data that are often seen in practical applications. The least squares method for the classical accelerated failure time model has been extended to accommodate the heteroscedasticity in survival
-
An RIHT statistic for testing the equality of several high-dimensional mean vectors under homoskedasticity Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-25 Qiuyan Zhang, Chen Wang, Baoxue Zhang, Hu Yang
In this article, the problem of testing the equality of several mean vectors is considered under the homoskedasticity in a high-dimensional setting. A ridgelized Hotelling's T2 test (RIHT) is developed and the asymptotic distributions are derived. By requiring only the conditions on the first four moments of the underlying distribution, the RIHT test can be used to test the mean vector free of population
-
GP-BART: A novel Bayesian additive regression trees approach using Gaussian processes Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-25 Mateus Maia, Keefe Murphy, Andrew C. Parnell
The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines “weak” tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and
-
Recursive ridge regression using second-order stochastic algorithms Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-22 Antoine Godichon-Baggioni, Wei Lu, Bruno Portier
Recursive second-order stochastic algorithms are presented for solving ridge regression problems in the linear and binary logistic case. The proposed algorithms allow to update the estimates of ridge solution when the data arrive in continuous flow. Some guarantees on the almost sure behavior of the proposed algorithms are established. Numerical experiments on simulated and real-world data show the
-
Hybrid exact-approximate design approach for sparse functional data Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-18 Ming-Hung Kao, Ping-Han Huang
Optimal designs for sparse functional data under the functional empirical component (FEC) settings are studied. This design issue has some unique features, making it different from classical design problems. To efficiently obtain optimal exact and approximate designs, new computational methods and useful theoretical results are developed, and a hybrid exact-approximate design approach is proposed.
-
Probability of default estimation in credit risk using mixture cure models Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-14 Rebeca Peláez, Ingrid Van Keilegom, Ricardo Cao, Juan M. Vilar
An estimator of the probability of default (PD) in credit risk is proposed. It is derived from a nonparametric conditional survival function estimator based on cure models. Asymptotic expressions for the bias and the variance, as well as the asymptotic normality of the proposed estimator are presented. A simulation study shows the performance of the nonparametric estimator compared with Beran's PD
-
Standard error estimates in hierarchical generalized linear models Comput. Stat. Data Anal. (IF 1.8) Pub Date : 2023-09-14 Shaobo Jin, Youngjo Lee
Hierarchical generalized linear models are often used to fit random effects models. However, attention is mostly paid to the estimation of fixed unknown parameters and inference for latent random effects. In contrast, standard error estimators receive less attention than they should be. Currently, the standard error estimators are based on various approximations, even when the mean parameters may be