当前期刊: Computational Statistics Go to current issue    加入关注   
显示样式:        排序: IF: - GO 导出
  • A dominance approach for comparing the performance of VaR forecasting models
    Comput. Stat. (IF 0.744) Pub Date : 2020-05-24
    Laura Garcia-Jorcano, Alfonso Novales

    We introduce three dominance criteria to compare the performance of alternative value at risk (VaR) forecasting models. The three criteria use the information provided by a battery of VaR validation tests based on the frequency and size of exceedances, offering the possibility of efficiently summarizing a large amount of statistical information. They do not require the use of any loss function defined

  • Estimation of parameters in multivariate wrapped models for data on a p -torus
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-24
    Anahita Nodehi, Mousa Golalizadeh, Mehdi Maadooliat, Claudio Agostinelli

    Multivariate circular observations, i.e. points on a torus arise frequently in fields where instruments such as compass, protractor, weather vane, sextant or theodolite are used. Multivariate wrapped models are often appropriate to describe data points scattered on p-dimensional torus. However, the statistical inference based on such models is quite complicated since each contribution in the log-likelihood

  • R package for statistical inference in dynamical systems using kernel based gradient matching: KGode
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-23
    Mu Niu, Joe Wandy, Rónán Daly, Simon Rogers, Dirk Husmeier

    Many processes in science and engineering can be described by dynamical systems based on nonlinear ordinary differential equations (ODEs). Often ODE parameters are unknown and not directly measurable. Since nonlinear ODEs typically have no closed form solution, standard iterative inference procedures require a computationally expensive numerical integration of the ODEs every time the parameters are

  • Bayesian inference of nonlinear hysteretic integer-valued GARCH models for disease counts
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-18
    Cathy W. S. Chen, Sangyeol Lee, K. Khamthong

    This study proposes a class of nonlinear hysteretic integer-valued GARCH models in order to describe the occurrence of weekly dengue hemorrhagic fever cases via three meteorological covariates: precipitation, average temperature, and relative humidity. The proposed model adopts the hysteretic three-regime switching mechanism with a buffer zone that are able to explain various characteristics. This

  • Optimal imputation of the missing data using multi auxiliary information
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-18
    Shashi Bhushan, Abhay Pratap Pandey

    This article deals with some new imputation methods by extending the work of Bhushan and Pandey using multi-auxiliary information. The popularly used imputation like mean imputation, ratio method of imputation, regression method of imputation and power transformation method are special cases of the proposed methods apart from being less efficient than the proposed methods. The proposed imputation methods

  • A modified Canny edge detector based on weighted least squares
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-15
    Xu Qin

    Edge detection is the front-end processing stage in most computer vision and image understanding systems. Among various edge detection techniques, Canny edge detector is the one of most commonly used. In this paper a modified Canny edge detection technique focusing on change of the Sobel operator is proposed. Instead of convolution kernels, the weighted least squares method is utilized to calculate

  • Computation of the expected value of a function of a chi-distributed random variable
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-13
    Paul Kabaila, Nishika Ranathunga

    We consider the problem of numerically evaluating the expected value of a smooth bounded function of a chi-distributed random variable, divided by the square root of the number of degrees of freedom. This problem arises in the contexts of simultaneous inference, the selection and ranking of populations and in the evaluation of multivariate t probabilities. It also arises in the assessment of the coverage

  • Advanced algorithms for penalized quantile and composite quantile regression
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-12
    Matthew Pietrosanu, Jueyu Gao, Linglong Kong, Bei Jiang, Di Niu

    In this paper, we discuss a family of robust, high-dimensional regression models for quantile and composite quantile regression, both with and without an adaptive lasso penalty for variable selection. We reformulate these quantile regression problems and obtain estimators by applying the alternating direction method of multipliers (ADMM), majorize-minimization (MM), and coordinate descent (CD) algorithms

  • Dirichlet process mixtures under affine transformations of the data
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-12
    Julyan Arbel, Riccardo Corradin, Bernardo Nipoti

    Location-scale Dirichlet process mixtures of Gaussians (DPM-G) have proved extremely useful in dealing with density estimation and clustering problems in a wide range of domains. Motivated by an astronomical application, in this work we address the robustness of DPM-G models to affine transformations of the data, a natural requirement for any sensible statistical method for density estimation and clustering

  • Two generalized nonparametric methods for estimating like densities
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-12
    Zongyuan Shang, Alan Ker

    This article presents two generalized nonparametric methods for estimating multiple, possibly like, densities. The first generalization contains the Nadaraya–Watson estimator, the Jones et al. (Biometrika 82(2):327–338, 1995) bias reduction estimator, and Ker (Stat Probab Lett 117:23–30, 2016) possibly similar estimator as special cases. The second generalization contains the Nadaraya–Watson estimator

  • Robust weighted Gaussian processes
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-09
    Ruben Ramirez-Padron, Boris Mederos, Avelino J. Gonzalez

    This paper presents robust weighted variants of batch and online standard Gaussian processes (GPs) to effectively reduce the negative impact of outliers in the corresponding GP models. This is done by introducing robust data weighers that rely on robust and quasi-robust weight functions that come from robust M-estimators. Our robust GPs are compared to various GP models on four datasets. It is shown

  • Penalized weighted composite quantile regression for partially linear varying coefficient models with missing covariates
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-09
    Jun Jin, Tiefeng Ma, Jiajia Dai, Shuangzhe Liu

    In this paper we study partially linear varying coefficient models with missing covariates. Based on inverse probability-weighting and B-spline approximations, we propose a weighted B-spline composite quantile regression method to estimate the non-parametric function and the regression coefficients. Under some mild conditions, we establish the asymptotic normality and Horvitz–Thompson property of the

  • Greedy clustering of count data through a mixture of multinomial PCA
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-08
    Nicolas Jouvin, Pierre Latouche, Charles Bouveyron, Guillaume Bataillon, Alain Livartowski

    Count data is becoming more and more ubiquitous in a wide range of applications, with datasets growing both in size and in dimension. In this context, an increasing amount of work is dedicated to the construction of statistical models directly accounting for the discrete nature of the data. Moreover, it has been shown that integrating dimension reduction to clustering can drastically improve performance

  • Transformation mixture modeling for skewed data groups with heavy tails and scatter
    Comput. Stat. (IF 0.744) Pub Date : 2020-07-06
    Yana Melnykov, Xuwen Zhu, Volodymyr Melnykov

    For decades, Gaussian mixture models have been the most popular mixtures in literature. However, the adequacy of the fit provided by Gaussian components is often in question. Various distributions capable of modeling skewness or heavy tails have been considered in this context recently. In this paper, we propose a novel contaminated transformation mixture model that is constructed based on the idea

  • KLERC: kernel Lagrangian expectile regression calculator
    Comput. Stat. (IF 0.744) Pub Date : 2020-06-25
    Songfeng Zheng

    As a generalization to the ordinary least square regression, expectile regression, which can predict conditional expectiles, is fitted by minimizing an asymmetric square loss function on the training data. In literature, the idea of support vector machine was introduced to expectile regression to increase the flexibility of the model, resulting in support vector expectile regression (SVER). This paper

  • An accelerated EM algorithm for mixture models with uncertainty for rating data
    Comput. Stat. (IF 0.744) Pub Date : 2020-06-22
    Rosaria Simone

    The paper is framed within the literature around Louis’ identity for the observed information matrix in incomplete data problems, with a focus on the implied acceleration of maximum likelihood estimation for mixture models. The goal is twofold: to obtain direct expressions for standard errors of parameters from the EM algorithm and to reduce the computational burden of the estimation procedure for

  • Clustering method for censored and collinear survival data
    Comput. Stat. (IF 0.744) Pub Date : 2020-06-21
    Silvia Liverani, Lucy Leigh, Irene L. Hudson, Julie E. Byles

    In this paper we propose a Dirichlet process mixture model for censored survival data with covariates. This model is suitable in two scenarios. First, this method can be used to identify clusters determined by both the censored survival data and the predictors. Second, this method is suitable for highly correlated predictors, in cases when the usual survival models cannot be implemented because they

  • A Bayesian quantile regression approach to multivariate semi-continuous longitudinal data
    Comput. Stat. (IF 0.744) Pub Date : 2020-06-20
    Jayabrata Biswas, Kiranmoy Das

    Quantile regression is a powerful tool for modeling non-Gaussian data, and also for modeling different quantiles of the probability distributions of the responses. We propose a Bayesian approach of estimating the quantiles of multivariate longitudinal data where the responses contain excess zeros. We consider a Tobit regression approach, where the latent responses are estimated using a linear mixed

  • Usage of the GO estimator in high dimensional linear models
    Comput. Stat. (IF 0.744) Pub Date : 2020-06-18
    Murat Genç, M. Revan Özkale

    This paper discusses simultaneous parameter estimation and variable selection and presents a new penalized regression method. The method is based on the idea that the coefficient estimates are shrunken towards a predetermined coefficient vector which represents the prior information. This method can result in smaller length estimates of the coefficients depending on the prior information compared to

  • Bayesian joint-quantile regression
    Comput. Stat. (IF 0.744) Pub Date : 2020-06-15
    Yingying Hu, Huixia Judy Wang, Xuming He, Jianhua Guo

    Estimation of low or high conditional quantiles is called for in many applications, but commonly encountered data sparsity at the tails of distributions makes this a challenging task. We develop a Bayesian joint-quantile regression method to borrow information across tail quantiles through a linear approximation of quantile coefficients. Motivated by a working likelihood linked to the asymmetric Laplace

  • What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis?
    Comput. Stat. (IF 0.744) Pub Date : 2020-06-13
    Bruce G. Marcot, Anca M. Hanea

    Cross-validation using randomized subsets of data—known as k-fold cross-validation—is a powerful means of testing the success rate of models used for classification. However, few if any studies have explored how values of k (number of subsets) affect validation results in models tested with data of known statistical properties. Here, we explore conditions of sample size, model structure, and variable

  • Time-dependent stress–strength reliability models based on phase type distribution
    Comput. Stat. (IF 0.744) Pub Date : 2020-05-10
    Joby K. Jose, M. Drisya

    In many of the real-life situations, the strength of a system and stress applied to it changes as time changes. In this paper, we consider time-dependent stress–strength reliability models subjected to random stresses at random cycles of time. Each run of the system causes a change in the strength of the system over time. We obtain the stress–strength reliability of the system at time t when the initial

  • Clustering multivariate functional data in group-specific functional subspaces
    Comput. Stat. (IF 0.744) Pub Date : 2020-02-12
    Amandine Schmutz, Julien Jacques, Charles Bouveyron, Laurence Chèze, Pauline Martin

    With the emergence of numerical sensors in many aspects of everyday life, there is an increasing need in analyzing multivariate functional data. This work focuses on the clustering of such functional data, in order to ease their modeling and understanding. To this end, a novel clustering technique for multivariate functional data is presented. This method is based on a functional latent mixture model

  • Modelling rankings in R: the PlackettLuce package
    Comput. Stat. (IF 0.744) Pub Date : 2020-02-12
    Heather L. Turner, Jacob van Etten, David Firth, Ioannis Kosmidis

    This paper presents the R package PlackettLuce, which implements a generalization of the Plackett–Luce model for rankings data. The generalization accommodates both ties (of arbitrary order) and partial rankings (complete rankings of subsets of items). By default, the implementation adds a set of pseudo-comparisons with a hypothetical item, ensuring that the underlying network of wins and losses between

  • A Bayesian approach to estimate parameters of ordinary differential equation
    Comput. Stat. (IF 0.744) Pub Date : 2020-02-10
    Hanwen Huang, Andreas Handel, Xiao Song

    We develop a Bayesian approach to estimate the parameters of ordinary differential equations (ODE) from the observed noisy data. Our method does not need to solve ODE directly. We replace the ODE constraint with a probability expression and combine it with the nonparametric data fitting procedure into a joint likelihood framework. One advantage of the proposed method is that for some ODE systems, one

  • Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization
    Comput. Stat. (IF 0.744) Pub Date : 2020-02-07
    Huiwen Wang, Ruiping Liu, Shanshan Wang, Zhichao Wang, Gilbert Saporta

    Independence screening procedure plays a vital role in variable selection when the number of variables is massive. However, high dimensionality of the data may bring in many challenges, such as multicollinearity or high correlation (possibly spurious) between the covariates, which results in marginal correlation being unreliable as a measure of association between the covariates and the response. We

  • Ascent with quadratic assistance for the construction of exact experimental designs
    Comput. Stat. (IF 0.744) Pub Date : 2020-02-04
    Lenka Filová, Radoslav Harman

    In the area of statistical planning, there is a large body of theoretical knowledge and computational experience concerning so-called optimal approximate designs of experiments. However, for an approximate design to be realizable, it must be converted into an exact, i.e., integer, design, which is usually done via rounding procedures. Although rapid, rounding procedures often yield worse exact designs

  • Discrete factor analysis using a dependent Poisson model
    Comput. Stat. (IF 0.744) Pub Date : 2020-01-31
    Rolf Larsson

    In this paper, we present a method for factor analysis of discrete data. This is accomplished by fitting a dependent Poisson model with a factor structure. To be able to analyze ordinal data, we also consider a truncated Poisson distribution. We try to find the model with the lowest AIC by employing a forward selection procedure. The probability to find the correct model is investigated in a simulation

  • Bayesian multiple changepoints detection for Markov jump processes
    Comput. Stat. (IF 0.744) Pub Date : 2020-01-25
    Lu Shaochuan

    A Bayesian multiple changepoint model for the Markov jump process is formulated as a Markov double chain model in continuous time. Inference for this type of multiple changepoint model is based on a two-block Gibbs sampling scheme. We suggest a continuous-time version of forward-filtering backward-sampling (FFBS) algorithm for sampling the full trajectories of the latent Markov chain via inverse transformation

  • Using tours to visually investigate properties of new projection pursuit indexes with application to problems in physics
    Comput. Stat. (IF 0.744) Pub Date : 2020-01-24
    Ursula Laa, Dianne Cook

    Projection pursuit is used to find interesting low-dimensional projections of high-dimensional data by optimizing an index over all possible projections. Most indexes have been developed to detect departure from known distributions, such as normality, or to find separations between known groups. Here, we are interested in finding projections revealing potentially complex bivariate patterns, using new

  • smoothROCtime: an R package for time-dependent ROC curve estimation
    Comput. Stat. (IF 0.744) Pub Date : 2020-01-20
    Susana Díaz-Coto, Pablo Martínez-Camblor, Sonia Pérez-Fernández

    The receiver operating characteristic (ROC) curve has become one of the most used tools for analyzing the diagnostic capacity of continuous biomarkers. When the studied outcome is a time-dependent variable two main generalizations have been proposed, based on properly extensions of the sensitivity and the specificity. Different procedures have been suggested for their estimation mainly under the presence

  • Estimation and determinants of Chinese banks’ total factor efficiency: a new vision based on unbalanced development of Chinese banks and their overall risk
    Comput. Stat. (IF 0.744) Pub Date : 2020-01-08
    Shiyi Chen, Wolfgang K. Härdle, Li Wang

    The paper estimates banks’ total factor efficiency (TFE) as well as TFE of each production factor by incorporating banks’ overall risk endogenously into bank’s production process as undesirable by-product in a Global-SMB Model. Our results show that, compared with a model incorporated with banks’ overall risk, a model considering only on-balance-sheet risk may over-estimate the integrated TFE (TFIE)

  • Parallel computing in linear mixed models
    Comput. Stat. (IF 0.744) Pub Date : 2020-01-07
    Fulya Gokalp Yavuz, Barret Schloerke

    In this study, we propose a parallel programming method for linear mixed models (LMM) generated from big data. A commonly used algorithm, expectation maximization (EM), is preferred for its use of maximum likelihood estimations, as the estimations are stable and simple. However, EM has a high computation cost. In our proposed method, we use a divide and recombine to split the data into smaller subsets

  • Modelling dependency effect to extreme value distributions with application to extreme wind speed at Port Elizabeth, South Africa: a frequentist and Bayesian approaches
    Comput. Stat. (IF 0.744) Pub Date : 2020-01-03
    Tadele Akeba Diriba, Legesse Kassa Debusho

    The dependency effect to extreme value distributions (EVDs) using the frequentist and Bayesian approaches have been used to analyse the extremes of annual and daily maximum wind speed at Port Elizabeth, South Africa. In the frequentist approach, the parameters of EVDs were estimated using maximum likelihood, whereas in the Bayesian approach the Markov Chain Monte Carlo technique with the Metropolis–Hastings

  • Efficient inference in state-space models through adaptive learning in online Monte Carlo expectation maximization
    Comput. Stat. (IF 0.744) Pub Date : 2019-12-03
    Donna Henderson, Gerton Lunter

    Expectation maximization (EM) is a technique for estimating maximum-likelihood parameters of a latent variable model given observed data by alternating between taking expectations of sufficient statistics, and maximizing the expected log likelihood. For situations where sufficient statistics are intractable, stochastic approximation EM (SAEM) is often used, which uses Monte Carlo techniques to approximate

  • Data driven value-at-risk forecasting using a SVR-GARCH-KDE hybrid
    Comput. Stat. (IF 0.744) Pub Date : 2019-11-13
    Marius Lux, Wolfgang Karl Härdle, Stefan Lessmann

    Appropriate risk management is crucial to ensure the competitiveness of financial institutions and the stability of the economy. One widely used financial risk measure is value-at-risk (VaR). VaR estimates based on linear and parametric models can lead to biased results or even underestimation of risk due to time varying volatility, skewness and leptokurtosis of financial return series. The paper proposes

  • On linearized ridge logistic estimator in the presence of multicollinearity
    Comput. Stat. (IF 0.744) Pub Date : 2019-11-11
    N. H. Jadhav

    Logistic Regression is a very popular method to model the dichotomous data. The maximum likelihood estimator (MLE) of unknown regression parameters of the logistic regression is not too accurate when multicollinearity exists among the covariates. It is well known that the presence of multicollinearity increases the variance of the MLE. To diminish the inflated mean square error (MSE) of the MLE due

  • A support vector machine based semiparametric mixture cure model
    Comput. Stat. (IF 0.744) Pub Date : 2019-11-04
    Peizhi Li, Yingwei Peng, Ping Jiang, Qingli Dong

    The mixture cure model is an extension of standard survival models to analyze survival data with a cured fraction. Many developments in recent years focus on the latency part of the model to allow more flexible modeling strategies for the distribution of uncured subjects, and fewer studies focus on the incidence part to model the probability of being uncured/cured. We propose a new mixture cure model

  • A fast imputation algorithm in quantile regression.
    Comput. Stat. (IF 0.744) Pub Date : 2019-03-19
    Hao Cheng,Ying Wei

    In many applications, some covariates could be missing for various reasons. Regression quantiles could be either biased or under-powered when ignoring the missing data. Multiple imputation and EM-based augment approach have been proposed to fully utilize the data with missing covariates for quantile regression. Both methods however are computationally expensive. We propose a fast imputation algorithm

  • Neural network gradient Hamiltonian Monte Carlo.
    Comput. Stat. (IF 0.744) Pub Date : 2019-01-08
    Lingge Li,Andrew Holbrook,Babak Shahbaba,Pierre Baldi

    Hamiltonian Monte Carlo is a widely used algorithm for sampling from posterior distributions of complex Bayesian models. It can efficiently explore high-dimensional parameter spaces guided by simulated Hamiltonian flows. However, the algorithm requires repeated gradient calculations, and these computations become increasingly burdensome as data sets scale. We present a method to substantially reduce

  • Fusion Learning Algorithm to Combine Partially Heterogeneous Cox Models.
    Comput. Stat. (IF 0.744) Pub Date : 2018-07-17
    Lu Tang,Ling Zhou,Peter X K Song

    We propose a fusion learning procedure to perform regression coefficients clustering in the Cox proportional hazards model when parameters are partially heterogeneous across certain predefined subgroups, such as age groups. One major issue pertains to the fact that the same covariate may have different influence on the survival time across different subgroups. Learning differences in covariate effects

  • Statistical inference in mechanistic models: time warping for improved gradient matching.
    Comput. Stat. (IF 0.744) Pub Date : 2018-01-01
    Mu Niu,Benn Macdonald,Simon Rogers,Maurizio Filippone,Dirk Husmeier

    Inference in mechanistic models of non-linear differential equations is a challenging problem in current computational statistics. Due to the high computational costs of numerically solving the differential equations in every step of an iterative parameter adaptation scheme, approximate methods based on gradient matching have become popular. However, these methods critically depend on the smoothing

  • Frequentist Standard Errors of Bayes Estimators.
    Comput. Stat. (IF 0.744) Pub Date : 2017-09-26
    DongHyuk Lee,Raymond J Carroll,Samiran Sinha

    Frequentist standard errors are a measure of uncertainty of an estimator, and the basis for statistical inferences. Frequestist standard errors can also be derived for Bayes estimators. However, except in special cases, the computation of the standard error of Bayesian estimators requires bootstrapping, which in combination with Markov chain Monte Carlo (MCMC) can be highly time consuming. We discuss

  • On the impact of model selection on predictor identification and parameter inference.
    Comput. Stat. (IF 0.744) Pub Date : 2017-07-12
    Ruth M Pfeiffer,Andrew Redd,Raymond J Carroll

    We assessed the ability of several penalized regression methods for linear and logistic models to identify outcome-associated predictors and the impact of predictor selection on parameter inference for practical sample sizes. We studied effect estimates obtained directly from penalized methods (Algorithm 1), or by refitting selected predictors with standard regression (Algorithm 2). For linear models

  • A note on estimating the bent line quantile regression model.
    Comput. Stat. (IF 0.744) Pub Date : 2017-06-01
    Yanyang Yan,Feipeng Zhang,Xiaoying Zhou

    This paper considers a new estimating method for the bent line quantile regression model. By a simple linearization technique, the proposed method can simultaneously obtain the estimates of the regression coefficients and the change-point location. Moreover, it can be readily implemented by current software. Simulation studies demonstrate that the proposed method has good finite sample performance

  • Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions.
    Comput. Stat. (IF 0.744) Pub Date : 2016-05-10
    Ivo D Dinov,Kyle Siegrist,Dennis K Pearl,Alexandr Kalinin,Nicolas Christou

    Probability distributions are useful for modeling, simulation, analysis, and inference on varieties of natural processes and physical phenomena. There are uncountably many probability distributions. However, a few dozen families of distributions are commonly defined and are frequently used in practice for problem solving, experimental applications, and theoretical studies. In this paper, we present

  • On maximum likelihood estimation of the concentration parameter of von Mises-Fisher distributions.
    Comput. Stat. (IF 0.744) Pub Date : 2014-10-14
    Kurt Hornik,Bettina Grün

    Maximum likelihood estimation of the concentration parameter of von Mises-Fisher distributions involves inverting the ratio [Formula: see text] of modified Bessel functions and computational methods are required to invert these functions using approximative or iterative algorithms. In this paper we use Amos-type bounds for [Formula: see text] to deduce sharper bounds for the inverse function, determine

  • Bayesian model-based tight clustering for time course data.
    Comput. Stat. (IF 0.744) Pub Date : 2010-03-01
    Yongsung Joo,G Casella,J Hobert

    Cluster analysis has been widely used to explore thousands of gene expressions from microarray analysis and identify a small number of similar genes (objects) for further detailed biological investigation. However, most clustering algorithms tend to identify loose clusters with too many genes. In this paper, we propose a Bayesian tight clustering method for time course gene expression data, which selects

  • Geographic Information Systems.
    Comput. Stat. (IF 0.744) Pub Date : 2009-01-01
    William F Wieczorek,Alan M Delmerico

    This chapter presents an overview of the development, capabilities, and utilization of geographic information systems (GIS). There are nearly an unlimited number of applications that are relevant to GIS because virtually all human interactions, natural and man-made features, resources, and populations have a geographic component. Everything happens somewhere and the location often has a role that affects

  • A Parametric k-Means Algorithm.
    Comput. Stat. (IF 0.744) Pub Date : 2007-10-06
    Thaddeus Tarpey

    The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced

  • Random forest with acceptance–rejection trees
    Comput. Stat. (IF 0.744) Pub Date : 2019-10-29
    Peter Calhoun, Melodie J. Hallett, Xiaogang Su, Guy Cafri, Richard A. Levine, Juanjuan Fan

    In this paper, we propose a new random forest method based on completely randomized splitting rules with an acceptance–rejection criterion for quality control. We show how the proposed acceptance–rejection (AR) algorithm can outperform the standard random forest algorithm (RF) and some of its variants including extremely randomized (ER) trees and smooth sigmoid surrogate (SSS) trees. Twenty datasets

  • Near G-optimal Tchakaloff designs
    Comput. Stat. (IF 0.744) Pub Date : 2019-10-25
    Len Bos, Federico Piazzon, Marco Vianello

    We show that the notion of polynomial mesh (norming set), used to provide discretizations of a compact set nearly optimal for certain approximation theoretic purposes, can also be used to obtain finitely supported near G-optimal designs for polynomial regression. We approximate such designs by a standard multiplicative algorithm, followed by measure concentration via Caratheodory-Tchakaloff compression

  • A simple method for implementing Monte Carlo tests
    Comput. Stat. (IF 0.744) Pub Date : 2019-10-19
    Dong Ding, Axel Gandy, Georg Hahn

    We consider a statistical test whose p value can only be approximated using Monte Carlo simulations. We are interested in deciding whether the p value for an observed data set lies above or below a given threshold such as 5%. We want to ensure that the resampling risk, the probability of the (Monte Carlo) decision being different from the true decision, is uniformly bounded. This article introduces

  • Detection and estimation of additive outliers in seasonal time series
    Comput. Stat. (IF 0.744) Pub Date : 2019-10-15
    Francesco Battaglia, Domenico Cucina, Manuel Rizzo

    The detection of outliers in a time series is an important issue because their presence may have serious negative effects on the analysis in many different ways. Moreover the presence of a complex seasonal pattern in the series could affect the properties of the usual outlier detection procedures. Therefore modelling the appropriate form of seasonality is a very important step when outliers are present

  • Estimation of a zero-inflated Poisson regression model with missing covariates via nonparametric multiple imputation methods
    Comput. Stat. (IF 0.744) Pub Date : 2019-10-14
    Shen-Ming Lee, T. Martin Lukusa, Chin-Shang Li

    Zero-inflated Poisson (ZIP) regression is widely applied to model effects of covariates on an outcome count with excess zeros. In some applications, covariates in a ZIP regression model are partially observed. Based on the imputed data generated by applying the multiple imputation (MI) schemes developed by Wang and Chen (Ann Stat 37:490–517, 2009), two methods are proposed to estimate the parameters

  • Application of the sequential matrix diagonalization algorithm to high-dimensional functional MRI data
    Comput. Stat. (IF 0.744) Pub Date : 2019-10-09
    Manuel Carcenac, Soydan Redif

    This paper introduces an adaptation of the sequential matrix diagonalization (SMD) method to high-dimensional functional magnetic resonance imaging (fMRI) data. SMD is currently the most efficient statistical method to perform polynomial eigenvalue decomposition. Unfortunately, with current implementations based on dense polynomial matrices, the algorithmic complexity of SMD is intractable and it cannot

  • Diagnosis and quantification of the non-essential collinearity
    Comput. Stat. (IF 0.744) Pub Date : 2019-10-04
    Román Salmerón-Gómez, Ainara Rodríguez-Sánchez, Catalina García-García

    Marquandt and Snee (Am Stat 29(1):3–20, 1975), Marquandt (J Am Stat Assoc 75(369):87–91, 1980) and Snee and Marquardt (Am Stat 38(2):83–87, 1984) refer to non-essential multicollinearity as that caused by the relation with the independent term. Although it is clear that the solution is to center the independent variables in the regression model, it is unclear when this kind of collinearity exists.

  • Real-manufacturing-oriented big data analysis and data value evaluation with domain knowledge
    Comput. Stat. (IF 0.744) Pub Date : 2019-09-23
    Weichang Kong, Fei Qiao, Qidi Wu

    As one of the most popular topics currently, big data has played an important role in both academic research and practical applications. However, in the manufacturing industry, it is difficult to make full use of the research results for production optimization and/or management due to the low quality of real workshop data. Typical quality problems of real workshop data include the information match

  • Closed-form maximum likelihood estimator for generalized linear models in the case of categorical explanatory variables: application to insurance loss modeling
    Comput. Stat. (IF 0.744) Pub Date : 2019-08-22
    Alexandre Brouste, Christophe Dutang, Tom Rohmer

    Generalized linear models with categorical explanatory variables are considered and parameters of the model are estimated by an exact maximum likelihood method. The existence of a sequence of maximum likelihood estimators is discussed and considerations on possible link functions are proposed. A focus is then given on two particular positive distributions: the Pareto 1 distribution and the shifted

  • Improving accuracy of financial distress prediction by considering volatility: an interval-data-based discriminant model
    Comput. Stat. (IF 0.744) Pub Date : 2019-08-20
    Rong Guan, Huiwen Wang, Haitao Zheng

    Financial distress prediction models are much challenged in identifying a distressed company two or more years prior to the occurrence of its actual distress, on the grounds that the distress signal is too weak to be captured at an early stage. The paper innovatively proposes to predict the distressed companies by a factorial discriminant model based on interval data. The main idea is that we use a

Contents have been reproduced by permission of the publishers.