-
Forecasting Hurricane‐Related Power Outages via Locally Optimized Random Forests Stat (IF 0.766) Pub Date : 2021-01-16 Tim Coleman; Mary Frances Dorn; Kim Kaufeld; Lucas Mentch
Standard supervised learning procedures are validated against a test set that is assumed to have come from the same distribution as the training data. However, in many problems, the test data may have come from a different distribution. We consider the case of having many labeled observations from one distribution, P1, and wanting to make predictions at unlabeled points that come from P2. We combine
-
A fast algorithm for integrative community detection of multi‐layer networks Stat (IF 0.766) Pub Date : 2021-01-15 Jiangzhou Wang; Jianhua Guo; Binghui Liu
Multi‐layer networks are often used to represent multiple types of relationships between nodes in network studies. In this paper, we investigate the community detection problem in multi‐layer networks. Specifically, we consider the multi‐layer stochastic block (MLSBM), which assumes that the community memberships are shared across all network layers, while other model parameters can be different between
-
Semiparametric Bayes Instrumental Variable Estimation with Many Weak Instruments Stat (IF 0.766) Pub Date : 2021-01-15 Ryo Kato; Takahiro Hoshino
We develop a new semiparametric Bayes instrumental variables estimation method. We employ the form of the regression function of the first‐stage equation and the disturbances are modelled nonparametrically to achieve better predictive power of the endogenous variables, whereas we use parametric formulation in the second‐stage equation, which is of interest in inference. Our simulation studies show
-
Detecting changes in mean in the presence of time‐varying autocovariance Stat (IF 0.766) Pub Date : 2021-01-15 Euan T. McGonigle; Rebecca Killick; Matthew A. Nunes
There has been much attention in recent years to the problem of detecting mean changes in a piecewise constant time series. Often, methods assume that the noise can be taken to be independent, identically distributed (IID), which in practice may not be a reasonable assumption. There is comparatively little work studying the problem of mean changepoint detection in time series with non‐trivial autocovariance
-
Weighted empirical likelihood for heteroscedastic varying coefficient partially nonlinear models with missing data Stat (IF 0.766) Pub Date : 2021-01-15 Guo‐Liang Fan; Lu‐Lu Wang; Hong‐Xia Xu
In this article, a weighted empirical likelihood technique for constructing the empirical likelihood confidence regions is applied to study the heteroscedastic varying coefficient partially nonlinear models with missing response data. We first give the estimator of the error variance based on the Nadaraya‐Watson kernel estimation method. Then a weighted empirical log‐likelihood ratio of the unknown
-
Multilevel Joint Modeling of Hospitalization and Survival in Patients on Dialysis Stat (IF 0.766) Pub Date : 2021-01-15 Esra Kürüm; Danh V. Nguyen; Yihao Li; Connie M. Rhee; Kamyar Kalantar‐Zadeh; Damla Şentürk
More than 720,000 patients with end‐stage renal disease in the U.S. require life‐sustaining dialysis treatment. In this population of typically older patients with a high morbidity burden, hospitalization is frequent at a rate of about twice per patient‐year. Aside from frequent hospitalizations, which is a major source of death risk, overall mortality in dialysis patients is higher than other comparable
-
When will gradient methods converge to max‐margin classifier under ReLU models? Stat (IF 0.766) Pub Date : 2020-12-31 Tengyu Xu; Yi Zhou; Kaiyi Ji; Yingbin Liang
We study the implicit bias of gradient descent methods in solving a binary classification problem over a linearly separable dataset. The classifier is described by a nonlinear ReLU model and the objective function adopts the exponential loss function. We first characterize the landscape of the loss function and show that there can exist spurious asymptotic local minimal besides asymptotic global minimal
-
Low Rank Approximation for Smoothing Spline via Eigensystem Truncation Stat (IF 0.766) Pub Date : 2020-12-29 Danqing Xu; Yuedong Wang
Smoothing splines provide a powerful and flexible means for nonparametric estimation and inference. With a cubic time complexity, fitting smoothing spline models to large data is computationally prohibitive. In this paper, we use the theoretical optimal eigenspace to derive a low rank approximation of the smoothing spline estimates. We develop a method to approximate the eigensystem when it is unknown
-
MuSP: A Multi‐step Screening Procedure for Sparse Recovery Stat (IF 0.766) Pub Date : 2020-12-25 Yuehan Yang; Ji Zhu; Edward I. George
We propose a Multi‐step Screening Procedure (MuSP) for the recovery of sparse linear models in high‐dimensional data. This method is based on a repeated small penalty strategy that quickly converges to an estimate within a few iterations. Specifically, in each iteration, an adaptive lasso regression with a small penalty is fit within the reduced feature space obtained from the previous step, rendering
-
Efficient Split Likelihood‐based Method for Community Detection of Large‐scale Networks Stat (IF 0.766) Pub Date : 2020-12-18 Jiangzhou Wang; Binghui Liu; Jianhua Guo
Stochastic block model (SBM) is widely employed as a canonical model for network community detection. Recovering community labels under SBM is not a trivial task, since its theoretical optimization problem is NP‐hard. To solve this problem, numerous statistical methods have been developed in the literature, most of which are, however, not applicable to large‐scale networks. To overcome this limitation
-
Visualizing the Food Landscape of Durham, North Carolina Stat (IF 0.766) Pub Date : 2020-12-17 Joseph L. Graves; Gizem Templeton; Lauren Davis; Seong‐Tae Kim
In partnership with community leaders of Durham, North Carolina, the Duke World Food Policy Center is creating a Durham Food Justice Plan (DFJP) for envisioning an equitable food system. The Food Justice plan serves to incorporate Durham’s local food history in terms of combating historical and present injustices in the food system. We propose creating an integrative, interactive visual for DFJP to
-
Predicting Lifespan of Drosophila Melanogaster: A Novel Application of Convolutional Neural Networks and Zero‐Inflated Autoregressive Conditional Poisson Model Stat (IF 0.766) Pub Date : 2020-12-08 Yi Zhang; V.A. Samaranayake; Gayla R. Olbricht; Matthew Thimgan
A model to classify the lifespan of Drosophila, the fruit fly, into short and long‐lived categories based on a sleep characteristic, extracted from activity data, is developed using a two‐stage process. Stage one models the per minute activity counts of each fly using a zero‐inflated Autoregressive Conditional Poisson model. These probabilities are allowed to vary hourly, reflecting the circadian and
-
Weight Normalized Deep Neural Networks Stat (IF 0.766) Pub Date : 2020-12-08 Yixi Xu; Xiao Wang
The generalization error is the difference between the expected risk and the empirical risk of a learning algorithm. This generalization error can be upper bounded by the Rademacher complexity of the underlying hypothesis class with high probability. This paper studies the function class of Lp,q weight normalized deep neural networks. We present a general framework for norm‐based capacity control and
-
Outcome weighted ψ‐learning for individualized treatment rules Stat (IF 0.766) Pub Date : 2020-12-07 Mingyang Liu; Xiaotong Shen; Wei Pan
An individualized treatment rule is often employed to maximize a certain patient‐specific clinical outcome based on his/her clinical or genomic characteristics as well as heterogeneous response to treatments. Although developing such a rule is conceptually important to personalized medicine, existing methods such as the partial least squares Qian and Murphy (2011) suffers from the difficulty of indirect
-
On the Estimation Bias in First‐Order Bifurcating Autoregressive Models Stat (IF 0.766) Pub Date : 2020-12-07 Tamer M. Elbayoumi; Sayed A. Mostafa
In this paper, we study the bias of the least‐squares (LS) estimation for the stationary first‐order bifurcating autoregressive [BAR(1)] model which is commonly used to model binary tree‐structured data that appear in many applications, most famously cell‐lineage applications. We first show that the LS estimator can have large bias for both small and moderate sized samples and that this bias is dependent
-
Sub‐Weibull distributions: Generalizing sub‐Gaussian and sub‐Exponential properties to heavier tailed distributions Stat (IF 0.766) Pub Date : 2020-10-01 Mariia Vladimirova; Stéphane Girard; Hien Nguyen; Julyan Arbel
We propose the notion of sub‐Weibull distributions, which are characterized by tails lighter than (or equally light as) the right tail of a Weibull distribution. This novel class generalizes the sub‐Gaussian and sub‐Exponential families to potentially heavier tailed distributions. Sub‐Weibull distributions are parameterized by a positive tail index θ and reduce to sub‐Gaussian distributions for θ =
-
Better Together: Extending JMP with Open Source Software Stat (IF 0.766) Pub Date : 2020-12-03 Nascif Abousalh‐Neto; Meijian Guan; Ruth Hummel
JMP is commercial software designed for interactive data analysis and exploration. JMP’s high‐level, visual interface makes it an outstanding tool for teaching best practices, methods, and model building techniques. JMP is also designed for extensibility, with features that allow the embedding of and deployment to open source packages and environments. In this paper, we will explore use cases that
-
On a new test of fit to the beta distribution Stat (IF 0.766) Pub Date : 2020-12-02 Bruno Ebner; Shawn C. Liebenberg
We propose a new L2‐type goodness‐of‐fit test for the family of beta distributions based on a conditional moment characterisation. The asymptotic null distribution is identified, and since it depends on the underlying parameters, a parametric bootstrap procedure is proposed. Consistency against all alternatives that satisfy a convergence criterion is shown, and a Monte Carlo simulation study indicates
-
Bayesian Inference for Polycrystalline Materials Stat (IF 0.766) Pub Date : 2020-11-27 James Matuk; Oksana Chkrebtii; Stephen Niezgoda
Polycrystalline materials, such as metals, are comprised of heterogeneously oriented crystals. Observed crystal orientations are modelled as a sample from an orientation distribution function (ODF), which determines a variety of material properties and is therefore of great interest to practitioners. Observations consist of quaternions, 4‐dimensional unit vectors reflecting both orientation and rotation
-
Creating optimal conditions for reproducible data analysis in R with ‘fertile’† Stat (IF 0.766) Pub Date : 2020-11-26 Audrey M. Bertin; Benjamin S. Baumer
The advancement of scientific knowledge increasingly depends on ensuring that data‐driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation and no clear consensus on standards of what constitutes reproducibility
-
Data Visualization Case Studies for High‐Dimensional Data Validation Stat (IF 0.766) Pub Date : 2020-11-26 Aaron R. Williams
Microsimulation and synthetic data are often high‐dimensional, requiring extensive validation and exploration to compare results against certain benchmarks. In both cases, validation is necessary to ensure that the many univariate distributions and multivariate relationships in the new data are similar to the many univariate distributions and multivariate relationships in the underlying data. This
-
Functional Singular Spectrum Analysis Stat (IF 0.766) Pub Date : 2020-11-25 Hossein Haghbin; Seyed Morteza Najibi; Rahim Mahmoudvand; Jordan Trinka; Mehdi Maadooliat
In this paper, we develop a new extension of the Singular Spectrum Analysis (SSA) called functional SSA to analyze functional time series. The new methodology is constructed by integrating ideas from functional data analysis and univariate SSA. Specifically, we introduce a trajectory operator in the functional world, which is equivalent to the trajectory matrix in the regular SSA. In the regular SSA
-
Likelihood‐Based Inference for Generalized Linear Mixed Models: Inference with the R Package glmm Stat (IF 0.766) Pub Date : 2020-11-25 Christina Knudson; Sydney Benson; Charles Geyer; Galin Jones
The R package glmm enables likelihood‐based inference for generalized linear mixed models with a canonical link. No other publicly‐available software accurately conducts likelihood‐based inference for generalized linear mixed models with crossed random effects. glmm is able to do so by approximating the likelihood function and two derivatives using importance sampling. The importance sampling distribution
-
Statistical Significance Calculations for Scenarios in Visual Inference Stat (IF 0.766) Pub Date : 2020-11-25 Susan Vanderplas; Christian Röttger; Dianne Cook; Heike Hofmann
Statistical inference provides the protocols for conducting rigorous science, but data plots provide the opportunity to discover the unexpected. These disparate endeavors are bridged by visual inference, where a lineup protocol can be employed for statistical testing. Human observers are needed to assess the lineups, typically using a crowd‐sourcing service. This paper describes a new approach for
-
Statistical Inference for Nonparametric Censored Regression Stat (IF 0.766) Pub Date : 2020-11-23 Guangcai Mao; Jing Zhang
Nonparametric regression is of primary importance in many statistical applications. For the data with censored outcome, how to construct a confidence band for regression function is a basic issue but has limited researches. We propose a procedure to construct the pointwise and simultaneous confidence bands for regression function based on a debiased estimator, which is proposed by correcting the bias
-
Hosting a Data Science Hackathon with Limited Resources Stat (IF 0.766) Pub Date : 2020-11-23 Kristin Kuter; Christopher Wedrychowicz
In this paper we will detail our experiences developing and organizing an annual machine learning competition at Saint Mary’s College. We will detail our process of collecting data for the competition as well as the logistical challenges faced when hosting such an event at a small liberal arts college. We believe that this report will be of interest to colleagues teaching data science at institutions
-
Modern Multiple Imputation with Functional Data Stat (IF 0.766) Pub Date : 2020-11-23 Aniruddha Rajendra Rao; Matthew Reimherr
This work considers the problem of fitting functional models with sparsely and irregularly sampled functional data. It overcomes the limitations of the state‐of‐the‐art methods, which face major challenges in the fitting of more complex non‐linear models. Currently, many of these models cannot be consistently estimated unless the number of observed points per curve grows sufficiently quickly with the
-
Modernizing k‐Nearest Neighbors Stat (IF 0.766) Pub Date : 2020-11-23 Robin Elizabeth Yancey; Bochao Xin; Norm Matloff
k‐nearest neighbors} (k‐NN) method is one of the oldest statistical/machine learning techniques. It is included in virtually every major package, such as caret, parsnp, mlr3 and scikit‐learn. Yet those packages do not go beyond the basics. With today's high‐speed computation capability, k‐NN can be made much more powerful. Here we present directions in which that can be done:
-
Closed‐form Expressions for Maximum Mean Discrepancy with Applications to Wasserstein Auto‐Encoders Stat (IF 0.766) Pub Date : 2020-11-17 Raif M. Rustamov
The Maximum Mean Discrepancy (MMD) has found numerous applications in statistics and machine learning, most recently as a penalty in the Wasserstein Auto‐Encoder (WAE). In this paper we compute closed‐form expressions for estimating the Gaussian kernel based MMD between a given distribution and the standard multivariate normal distribution. This formula reveals a connection to the Baringhaus‐Henze‐Epps‐Pulley
-
Nested model averaging on solution path for high‐dimensional linear regression Stat (IF 0.766) Pub Date : 2020-09-24 Yang Feng; Qingfeng Liu
We study the nested model averaging method on the solution path for a high‐dimensional linear regression problem. In particular, we propose to combine model averaging with regularized estimators (e.g., lasso, elastic net, and Sorted L‐One Penalized Estimation [SLOPE]) on the solution path for high‐dimensional linear regression. In simulation studies, we first conduct a systematic investigation on the
-
Semi‐supervised logistic learning based on exponential tilt mixture models Stat (IF 0.766) Pub Date : 2020-09-04 Xinwei Zhang; Zhiqiang Tan
Consider semi‐supervised learning for classification, where both labelled and unlabelled data are available for training. The goal is to exploit both datasets to achieve higher prediction accuracy than just using labelled data alone. We develop a semi‐supervised logistic learning method based on exponential tilt mixture models by extending a statistical equivalence between logistic regression and exponential
-
Forecasting subnational COVID‐19 mortality using a day‐of‐the‐week adjusted Bayesian hierarchical model Stat (IF 0.766) Pub Date : 2020-11-06 Justin J. Slater; Patrick E. Brown; Jeffrey S. Rosenthal
As of October 2020, the death toll from the COVID‐19 pandemic has risen over 1.1 million deaths worldwide. Reliable estimates of mortality due to COVID‐19 are important to guide intervention strategies such as lockdowns and social distancing measures. In this paper, we develop a data‐driven model that accurately and consistently estimates COVID‐19 mortality at the regional level early in the epidemic
-
Gradual Variance Change Point Detection with A Smoothly‐changing Mean Trend Stat (IF 0.766) Pub Date : 2020-11-03 Wanfeng Liang; Libai Xu
In contrast to the analysis of abrupt changes, methods for detecting gradual change points are less developed. In this paper we are interested in the scenario that the variance of data may vary gradually while the mean of data changes in a smooth fashion. We propose a penalized weighted least squares approach with an iterative estimation procedure to detect the gradual variance change point with smoothly‐changing
-
Mann‐Whitney Test for Two‐phase Stratified Sampling Stat (IF 0.766) Pub Date : 2020-10-30 Takumi Saegusa
We consider the Mann‐Whitney test for two‐phase stratified sampling. In this design, the i.i.d. sample is obtained at the first phase and then stratified based on auxiliary variables. At the second phase, stratified subsamples are obtained without replacement to collect variables of interest. The resultant data are biased and dependent sample due to stratification and sampling without replacement.
-
VtNet: a Neural Network with Variable Importance Assessment Stat (IF 0.766) Pub Date : 2020-10-30 Lixiang Zhang; Lin Lin; Jia Li
The architectures of many neural networks rely heavily on the underlying grid associated with the variables, for instance, the lattice of pixels in an image. For general biomedical data without a grid structure, the multi‐layer perceptron (MLP) and deep belief network (DBN) are often used. However, in these networks, variables are treated homogeneously in the sense of network structure; and it is difficult
-
A family of parsimonious mixtures of multivariate Poisson‐lognormal distributions for clustering multivariate count data Stat (IF 0.766) Pub Date : 2020-08-25 Sanjeena Subedi; Ryan P. Browne
Multivariate count data are commonly encountered through high‐throughput sequencing technologies in bioinformatics, text mining, or sports analytics. Although the Poisson distribution seems a natural fit to these count data, its multivariate extension is computationally expensive. In most cases, mutual independence among the variables is assumed; however, this fails to take into account the correlation
-
Randomized estimation of functional covariance operator via subsampling Stat (IF 0.766) Pub Date : 2020-08-22 Shiyuan He; Xiaomeng Yan
Covariance operators are fundamental concepts and modelling tools for many functional data analysis methods, such as functional principal component analysis. However, the empirical (or estimated) covariance operator becomes too costly to compute when the functional dataset gets big. This paper studies a randomized algorithm for covariance operator estimation. The algorithm works by sampling and rescaling
-
On the non‐asymptotic and sharp lower tail bounds of random variables Stat (IF 0.766) Pub Date : 2020-09-12 Anru R. Zhang; Yuchen Zhou
The non‐asymptotic tail bounds of random variables play crucial roles in probability, statistics, and machine learning. Despite much success in developing upper bounds on tail probabilities in literature, the lower bounds on tail probabilities are relatively fewer. In this paper, we introduce systematic and user‐friendly schemes for developing non‐asymptotic lower bounds of tail probabilities. In addition
-
Causal inference in the presence of missing data using a random forest based matching algorithm Stat (IF 0.766) Pub Date : 2020-10-23 Tristan Hillis; Maureen A. Guarcello; Richard A. Levine; Juanjuan Fan
Observational studies require matching across groups over multiple confounding variables. Across the literature, matching algorithms fail to handle this issue. In this way, missing values are regularly imputed prior to being considered in the matching process. However, imputing is not always practical, forcing us to drop an observation due to the deficiency of the chosen algorithm, decreasing the power
-
Directional analysis for point patterns on linear networks Stat (IF 0.766) Pub Date : 2020-10-15 Mehdi Moradi; Jorge Mateu; Carles Comas
Statistical analysis of point processes often assumes that the underlying process is isotropic in the sense that its distribution is invariant under rotation. For point processes on ℝ2, some tests based on the K‐ and nearest neighbour orientation functions have been proposed to check such an assumption. However, anisotropy and directional analysis need proper caution when dealing with point processes
-
Bayesian Group Learning for Shot Selection of Professional Basketball Players Stat (IF 0.766) Pub Date : 2020-10-15 Guanyu Hu; Hou‐Cheng Yang; Yishu Xue
In this paper, we develop a group learning approach to analyze the underlying heterogeneity structure of shot selection among professional basketball players in the NBA. We propose a mixture of finite mixtures (MFM) model to capture the heterogeneity of shot selection among different players based on Log Gaussian Cox process (LGCP). Our proposed method can simultaneously estimate the number of groups
-
Self‐Supervised Learning for Outlier Detection Stat (IF 0.766) Pub Date : 2020-10-14 Jan Diers; Christian Pigorsch
The identification of outliers is mainly based on unannotated data and therefore constitutes an unsupervised problem. The lack of a label leads to numerous challenges that do not occur or only occur to a lesser extent when using annotated data and supervised methods. In this paper, we focus on two of these challenges: the selection of hyperparameters and the selection of informative features. To this
-
Linear screening for high‐dimensional computer experiments Stat (IF 0.766) Pub Date : 2020-10-02 Chunya Li; Daijun Chen; Shifeng Xiong
In this paper we propose a linear variable screening method for computer experiments when the number of input variables is larger than the number of runs. This method uses a linear model to model the nonlinear data, and screens the important variables by existing screening methods for linear models. When the underlying simulator is nearly sparse, we prove that the linear screening method is asymptotically
-
Semi‐supervised joint learning for longitudinal clinical events classification using neural network models Stat (IF 0.766) Pub Date : 2020-08-11 Weijing Tang; Jiaqi Ma; Akbar K. Waljee; Ji Zhu
The success of deep learning neural network models often relies on the accessibility of a large number of labelled training data. In many health care settings, however, only a small number of accurately labelled data are available while unlabelled data are abundant. Further, input variables such as clinical events in the medical settings are usually of longitudinal nature, which poses additional challenges
-
Noisy low‐rank matrix completion under general bases Stat (IF 0.766) Pub Date : 2020-07-28 Lei Shi; Changliang Zou
In this paper, we consider the low‐rank matrix completion problem under general bases, which intends to recover a structured matrix via a linear combination of prespecified bases. Existing works focus primarily on orthonormal bases; however, it is often necessary to adopt nonorthonormal bases in some real applications. Thus, there is a great need to address the feasibility of some popular estimators
-
Visual Tests for Elliptically Symmetric Distributions Stat (IF 0.766) Pub Date : 2020-09-24 Pritha Guha; Biman Chakraborty
We propose a visual test of goodness of fit for families of elliptically symmetric distributions based on a test statistic derived from scale‐scale plots. The scale‐scale plots are constructed based on the volume functionals of the central rank regions. The test is motivated through the multivariate normal distributions, and extended to a test of elliptical symmetry. We derive the asymptotic properties
-
Nonasymptotic support recovery for high dimensional sparse covariance matrices Stat (IF 0.766) Pub Date : 2020-09-19 Adam B. Kashlak; Linglong Kong
For high dimensional data, the standard empirical estimator for the covariance matrix is very poor, and thus many methods have been proposed to more accurately estimate the covariance structure of high dimensional data. In this article, we consider estimation under the assumption of sparsity, but regularize with respect to the individual false positive rate for incorrectly including a matrix entry
-
Expectile Regression via Deep Residual Networks Stat (IF 0.766) Pub Date : 2020-09-18 Yiyi Yin; Hui Zou
Expectile is a generalization of the expected value in probability and statistics. In finance and risk management, the expectile is considered to be an important risk measure due to its connection with gain‐loss ratio and its coherent and elicitable properties. Linear multiple expectile regression was proposed in 1987 for estimating the conditional expectiles of a response given a set of covariates
-
Mixed effects envelope models Stat (IF 0.766) Pub Date : 2020-09-11 Yuyang Shi; Linquan Ma; Lan Liu
When multiple measures are collected repeatedly over time, redundancy typically exists among responses. The envelope method was recently proposed to reduce the dimension of responses without loss of information in regression with multivariate responses. It can gain substantial efficiency over the standard least squares estimator. In this paper, we generalize the envelope method to mixed effects models
-
Deep learning from a statistical perspective Stat (IF 0.766) Pub Date : 2020-08-31 Yubai Yuan, Yujia Deng, Yanqing Zhang, Annie Qu
As one of the most rapidly developing artificial intelligence techniques, deep learning has been applied in various machine learning tasks and has received great attention in data science and statistics. Regardless of the complex model structure, deep neural networks can be viewed as a nonlinear and nonparametric generalization of existing statistical models. In this review, we introduce several popular
-
Cross‐dimple in the cross‐covariance functions of bivariate isotropic random fields on spheres Stat (IF 0.766) Pub Date : 2020-08-27 Alfredo Alegría
Multivariate random fields allow to simultaneously model multiple spatially indexed variables, playing a fundamental role in geophysical, environmental, and climate disciplines. This paper introduces the concept of cross‐dimple for bivariate isotropic random fields on spheres and proposes an approach to build parametric models that possess this attribute. Our findings are based on the spectral representation
-
Exponential family tensor completion with auxiliary information Stat (IF 0.766) Pub Date : 2020-08-24 Jichen Yang, Nan Zhang
Tensor completion is among the most important tasks in tensor data analysis, which aims to fill the missing entries of a partially observed tensor. In many real applications, non‐Gaussian data such as binary or count data are frequently collected. Thus, it is inappropriate to assume that observations are normally distributed and formulate tensor completion with least squares based approaches. In this
-
Deep fiducial inference Stat (IF 0.766) Pub Date : 2020-08-16 Gang Li; Jan Hannig
Since the mid‐2000s, there has been a resurrection of interest in modern modifications of fiducial inference. To date, the main computational tool to extract a generalized fiducial distribution is Markov chain Monte Carlo (MCMC). We propose an alternative way of computing a generalized fiducial distribution that could be used in complex situations. In particular, to overcome the difficulty when the
-
Robust inference for nonlinear regression models from the Tsallis score: application to COVID-19 contagion in Italy. Stat (IF 0.766) Pub Date : 2020-08-12 Paolo Girardi,Luca Greco,Valentina Mameli,Monica Musio,Walter Racugno,Erlis Ruli,Laura Ventura
We discuss an approach of robust fitting on non‐linear regression models, in both frequentist and Bayesian approaches, which can be employed to model and predict the contagion dynamics of the coronavirus disease 2019 (COVID‐19) in Italy. The focus is on the analysis of epidemic data using robust dose–response curves, but the functionality is applicable to arbitrary non‐linear regression models.
-
A Bayesian non‐parametric approach for automatic clustering with feature weighting Stat (IF 0.766) Pub Date : 2020-08-11 Debolina Paul; Swagatam Das
Despite being a well‐known problem, feature weighting and feature selection are a major predicament for clustering. Most of the algorithms, which provide weighting or selection of features, require the number of clusters to be known in advance. On the other hand, the existing automatic clustering procedures that can determine the number of clusters are computationally expensive and often do not make
-
Disjunct support spike‐and‐slab priors for variable selection in regression under quasi‐sparseness Stat (IF 0.766) Pub Date : 2020-08-11 Daniel Andrade; Kenji Fukumizu
Sparseness of the regression coefficient vector is often a desirable property, because, among other benefits, sparseness improves interpretability. In practice, many true regression coefficients might be negligibly small, but nonzero, which we refer to as quasi‐sparseness. Spike‐and‐slab priors can be tuned to ignore very small regression coefficients and, as a consequence, provide a trade‐off between
-
Sparse nonparametric regression with regularized tensor product kernel Stat (IF 0.766) Pub Date : 2020-08-11 Hang Yu, Yuanjia Wang, Donglin Zeng
With growing interest to use black‐box machine learning for complex data with many feature variables, it is critical to obtain a prediction model that only depends on a small set of features to maximize generalizability. Therefore, feature selection remains to be an important and challenging problem in modern applications. Most of the existing methods for feature selection are based on either parametric
-
Model checking for parametric single‐index quantile models Stat (IF 0.766) Pub Date : 2020-08-06 Liangliang Yuan; Wenhui Liu; Xuemin Zi; Zhaojun Wang
In this work, we construct a lack‐of‐fit test for testing parametric single‐index quantile regression models. We apply the kernel smoothing technique for the multivariate nonparametric estimation involved in this task. To avoid the “curse of dimensionality” in multivariate nonparametric estimation and to fully utilize the information contained in the model, we employ a sufficient dimension reduction
-
Small run size design for model identification in 3m factorial experiments Stat (IF 0.766) Pub Date : 2020-08-04 Fariba Z. Labbaf, Hooshang Talebi
An active interaction in a main effect plan may cause biased estimation of the parameters in an analysis of variance (ANOVA) model. A fractional factorial design (FFD) with higher order resolution can resolve the alias problem, however, with a considerable number of runs. Alternatively, a search design (SD), the so‐called main effect plus k plan (MEP.k), with much less number of runs than FFD, is able
-
Mixture modelling of categorical sequences with secondary components Stat (IF 0.766) Pub Date : 2020-07-30 Xuwen Zhu
In this paper, the forward selected first‐order Markov mixture (FSFOMM) is proposed for modelling heterogeneous categorical sequences with secondary components capable of detecting outlying sequences within each cluster. Such sequences are assumed to have different transition probabilities in certain states. The model provides an attractive and flexible tool for diagnostics of unusual behaviours and
Contents have been reproduced by permission of the publishers.