-
Fallacy of Data-Selective Inference in Modelling Networks Stat (IF 2.451) Pub Date : 2022-08-03 Stefan Stein, Chenlei Leng
Recent years have seen a growing array of activities in developing statistical models for modelling real-life networks. Since many of these networks are sparse, an all too often practice in the literature is to apply a developed model to a sub-network typically by discarding nodes due to their lack of connectivity. In this note, we provide the first result highlighting issues with this practice which
-
Advice for Isolated Statisticians Collaborating in Academic Healthcare Center Settings Stat (IF 2.451) Pub Date : 2022-08-03 C. Christina Mehta, Margaret R. Stedman, Sowmya R. Rao, Robert Podolsky
A substantial number of statisticians work in isolated domain science departments without access to support networks and resources typical of larger statistical units. These isolated statisticians face many challenges including limited professional networks, non-traditional roles with idiosyncratic expectations and unique career paths. Furthermore, "the curse of success" lies ahead, as success of the
-
High Dimensional Nonparametric Tests For Linear Asset Pricing Models Stat (IF 2.451) Pub Date : 2022-08-02 Ping Zhao, Dachuan Chen, Xuemin Zi
This paper develops a novel nonparametric test for testing the high-dimensional alpha in linear asset pricing models, where the number of securities can be much larger than the time-dimension of the return series. The asymptotic null distribution and the local power property are established for a class of weighted spatial sign tests, which resulting in an optimal test INST by choosing the weight function
-
Alternative parameterization approaches for modeling risk assessment in the presence of imbalance: an example using parotid malignancy Stat (IF 2.451) Pub Date : 2022-07-27 Kaming Lo, Shari Messinger, Christopher Fundakowski, Zoukaa Sargi
determining how results from a diagnostic tool provide additional information on a patient’s risk of disease with In collaborative statistics, logistic regression models are commonly used with binary outcomes and reference cell coding for categorical predictors. However, despite the usefulness of reference cell coding schemes under many investigative objectives, it is not always appropriate to address
-
Multiple Third-Variable Analysis for Competing-Risk Data - With an Application to Explore Racial Disparity in Breast Cancer Recurrence Stat (IF 2.451) Pub Date : 2022-07-14 Q. Yu, L. Zhu, L. Zhang, M. Hsieh, X. Wu, B. Li
There are many racial and ethnic disparities in cancer outcomes. Through special studies supported by CDC, we found that compared with Caucasians, African-American women with breast cancer were more likely to have cancer recurrences. We are interested in exploring this racial disparity by identifying risk factors that contribute to the disparity and quantify their effects. Cancer may recur after a
-
Bayesian group sequential designs for cluster-randomized trials Stat (IF 2.451) Pub Date : 2022-07-08 Junwei Shen, Shirin Golchi, Erica E. M. Moodie, David Benrimoh
Flexible approaches have been proposed for individually randomized trials to save time or reduce sample size. However, flexible designs for cluster-randomized trials in which groups of participants rather than individuals are randomized to treatment arms are less common. Motivated by a cluster-randomized trial designed to assess the effectiveness of a machine-learning based clinical decision support
-
Higher order asymptotic refinements in Bell regressions Stat (IF 2.451) Pub Date : 2022-07-05 Artur J. Lemonte
The discrete Bell distribution and its associated regression model were introduced recently in the statistical literature. The Bell distribution has proved to be a useful alternative to the traditional Poisson distribution, mainly to deal with overdispersion. Likelihood-based inference on the Bell regression parameters relies on asymptotic assumptions like the sample size going to infinity. In this
-
On the asymptotic properties of a bagging estimator with a massive dataset Stat (IF 2.451) Pub Date : 2022-06-30 Yuan Gao, Riquan Zhang, Hansheng Wang
Bagging is a useful method for large-scale statistical analysis, especially when the computing resources are very limited. We study here the asymptotic properties of bagging estimators for M-estimation problems but with massive datasets. We theoretically prove that the resulting estimator is consistent and asymptotically normal under appropriate conditions. The results show that the bagging estimator
-
Consistency results of the M-regression function estimator for stationary continuous-time and ergodic data Stat (IF 2.451) Pub Date : 2022-06-27 Fatiha Mokhtari, Rachida Rouane, Saâdia Rahmani, Mustapha Rachdi
This paper is devoted to the study of the asymptotic properties of the kernel estimator of the robust regression function for stationary continuoustime and ergodic data. Such a dependence structure is an alternative to the strong mixing conditions usually assumed in functional time series analysis. More precisely, we consider the kernel type estimator of the robust regression function constructed from
-
The Academic Collaborative Statistician: Research, Training, and Evaluation Stat (IF 2.451) Pub Date : 2022-06-16 Emily H. Griffith, Julia L. Sharp, William C. Bridges, Bruce A. Craig, Kathryn J. Hanford, John R. Stevens
Statisticians with a primary collaborative position at their academic institution have a unique role in combining domain science and statistical research, and training and mentoring graduate students. As a result, it is important that their responsibilities be clearly understood by both the collaborative statisticians and their departments. In this paper, we discuss various steps that these statisticians
-
Minimax Optimal High-Dimensional Classification using Deep Neural Networks Stat (IF 2.451) Pub Date : 2022-06-15 Shuoyang Wang, Zuofeng Shang
High-dimensional classification is a fundamentally important research problem in high-dimensional data analysis. In this paper, we derive nonasymptotic rate for the minimax excess misclassification risk when feature dimension exponentially diverges with the sample size and the Bayes classifier possesses a complicated modular structure. We also show that classifiers based on deep neural network attain
-
Collaborative biostatistics and epidemiology in academic medical centers: A survey to assess relationships with health researchers and ethical implications Stat (IF 2.451) Pub Date : 2022-06-14 Katrina L. Devick, Heather J. Gunn, Lori Lyn Price, Jareen K. Meinzen-Derr, Felicity T. Enders, Susan M. Perkins, Phillip J. Schulte
The role of collaborative biostatisticians and epidemiologists in academic medical centers and how their degree type, supervisor type, and sex influences recognition and feelings of respect is poorly understood. We conducted a cross-sectional survey of self-identified biostatisticians and epidemiologists working in academic medical centers in the US or Canada. The survey was sent to 341 contacts at
-
Sparse covariance matrix estimation for ultrahigh dimensional data Stat (IF 2.451) Pub Date : 2022-06-10 Wanfeng Liang, Yue Wu, Hui Chen
We introduce a Covariance matrix Re_tted Cross Validation (CovRCV) estimation procedure, without requiring the Gaussian assumption. Specifically, we first used modified Cholesky decomposition (MCD) to transform covariance matrix estimation to coefficients estimation in the regression setting. Then we use estimation method based on RCV to mitigate the prevalent spurious correlation in the ultrahigh
-
The Virtual Consulting Company — Teaching Statistical Consulting Through Simulated Experience Stat (IF 2.451) Pub Date : 2022-06-10 David Shilane, Nicole L. Lorenzetti, Nicole Di Crecchio, David K. Kreutter
Training students to become effective consultants is an important goal of statistical education. The pedagogical models include drop-in consulting, workshops, long-term projects, and theory-based courses based on case studies. This study performs a comparison of the curricular designs. We then introduce a virtual model for a statistical consulting course based on simulated experience. The virtual model
-
On the effect of rounding on hypothesis testing when sample size is large Stat (IF 2.451) Pub Date : 2022-06-09 N. G. Ushakov, V. G. Ushakov
It is well known that sample moments are more sensitive and less robust than order statistics for robustness with respect to outliers. In this article, we show that the situation is exactly the opposite for robustness with respect to rounding. For large and very large sample sizes statistical procedures based on order statistics become non-applicable even for very mild data rounding while procedures
-
Family-wise error rate control in Gaussian graphical model selection via Distributionally Robust Optimization Stat (IF 2.451) Pub Date : 2022-06-05 Chau Tran, Pedro Cisneros-Velarde, Sang-Yun Oh, Alexander Petersen
Recently, a special case of precision matrix estimation based on a distributionally robust optimization (DRO) framework has been shown to be equivalent to the graphical lasso. From this formulation, a method for choosing the regularization term, i.e., for graphical model selection, was proposed. In this work, we establish a theoretical connection between the confidence level of graphical model selection
-
An Approach of Bayesian Variable Selection for Ultrahigh Dimensional Multivariate Regression Stat (IF 2.451) Pub Date : 2022-05-31 Xiaotian Dai, Guifang Fu, Randall Reese, Shaofei Zhao, Zuofeng Shang
In many practices, scientists are particularly interested in detecting which of the predictors are truly associated with a multivariate response. It is more accurate to model multiple responses as one vector rather than separating each component one by one. This is particularly true for complex traits having multiple correlated components. A Bayesian multivariate variable selection (BMVS) approach
-
A divide-and-conquer algorithm for core-periphery identification in large networks Stat (IF 2.451) Pub Date : 2022-05-26 Eric Yanchenko
Core-periphery structure is an important network feature where the network is broken into two components: a densely-connected core and a loosely-connected periphery. In this work, we propose a divide-and-conquer algorithm to identify core-periphery structure in large networks. By finding this structure on much smaller sub-samples of the network and then combining the results across sub-samples, this
-
Test of Independence for Hilbertian Random Variables Stat (IF 2.451) Pub Date : 2022-05-23 Bilol Banerjee, Anil K. Ghosh
In this article, we propose a test of independence for functional random variables modelled as elements of Hilbert spaces. First, we provide a general recipe for constructing measures of dependence among multiple random functions. These measures are non-negative and under fairly general assumptions, they take the value zero only when the functions are independent. We consider one such measure based
-
Flexible multivariate zero to k inflated power series regression model with applications Stat (IF 2.451) Pub Date : 2022-05-19 Hadi Saboori, Mahdi Doostparast
Inflated distributions are applied in various fields, including insurance, traffic networks, and survival analyses. First, they are defined by a baseline discrete distribution, and then extra masses are added to some points of interest, called inflated points, to achieve more flexible models for data analyses. The baseline distribution is arbitrary and application-dependent. Here, the rich family of
-
Asking Great Questions Stat (IF 2.451) Pub Date : 2022-05-05 Eric A. Vance, Ilana M. Trumble, Jessica L. Alzen, Heather S. Smith
The questions we ask and how we ask them will make a difference in how successful we are in meetings, in collaborations, and in our careers as statisticians and data scientists. What makes a question good and what makes a good question great? Great questions elicit information useful for accomplishing the tasks of a project and strengthen the statistician-domain expert relationship. Great questions
-
The development of a mobile app-focused deduplication strategy for the Apple Heart Study that informs recommendations for future digital trials Stat (IF 2.451) Pub Date : 2022-05-04 Ariadna Garcia, Justin Lee, Vidhya Balasubramanian, Rebecca Gardner, Santosh E. Gummidipundi, Grace Hung, Todd Ferris, Lauren Cheung, Sumbul Desai, Christopher B. Granger, Mellanie True Hills, Peter Kowey, Divya Nag, John S. Rumsfeld, Andrea M. Russo, Jeffrey W. Stein, Nisha Talati, David Tsay, Kenneth W. Mahaffey, Marco V. Perez, Mintu P. Turakhia, Haley Hedlin, Manisha Desai
An app-based clinical trial enrollment process can contribute to duplicated records, carrying data management implications. Our objective was to identify duplicated records in real-time in The Apple Heart Study (AHS).
-
Parametric nonstationary covariance functions on spheres Stat (IF 2.451) Pub Date : 2022-05-01 Lewis R. Blake, Emilio Porcu, Dorit M. Hammerling
Gaussian Processes are powerful tools for modelling spatial data. In this context, a significant amount of modelling focus is placed on specifying the covariance function, which is required to be symmetric and positive definite. Covariance functions have classically been defined and used in Euclidean space. However, as data collected from the globe becomes more prevalent, accounting for Earth's geometry
-
Finite mixture model of hidden Markov regression with covariate dependence Stat (IF 2.451) Pub Date : 2022-05-01 Shuchismita Sarkar, Xuwen Zhu
In recent days, a combination of finite mixture model (FMM) and hidden Markov model (HMM) is becoming popular for partitioning heterogeneous temporal data into homogeneous groups (clusters) with homogeneous time points (regimes). The regression mixtures commonly considered in this approach can also accommodate for covariates present in data. The classical fixed covariate approach, however, may not
-
Fast estimators for the mean function for functional data with detection limits Stat (IF 2.451) Pub Date : 2022-04-27 Haiyan Liu, Jeanine Houwing-Duistermaat
In many studies on disease progression, biomarkers are restricted by detection limits, hence informatively missing. Current approaches either ignore the problem by just filling in the value of the detection limit for the missing observations or apply a global approach for estimation of the mean function. The latter is time-consuming for dense data, and the obtained estimate depends on the whole observed
-
Standardized Dempster's non-exact test for high-dimensional mean vectors Stat (IF 2.451) Pub Date : 2022-04-14 Hongyan Fang, Yuanyuan Chen, Ling Chen, Wenzhi Yang, Binyan Jiang
Although the Hotelling's T2T2 test has been a widely used test for hypothesis testing problems on the mean vectors, it is not well defined when the data dimension is larger than the sample size. Dempster's non-exact test, as a remedy for the Hotelling's T2T2 test, is known to be more powerful than the Hotelling's T2T2 test and is well defined even when the dimension is much larger than the sample size
-
A machine learning approach to classification for traders in financial markets Stat (IF 2.451) Pub Date : 2022-03-21 Isaac D. Wright, Matthew Reimherr, John Liechty
We introduce new machine learning methods for clustering traders who are actively trading in a modern electronic exchange which uses a matching engine to track aggregate and individual-level limit order books. Each trader's individual limit order book is centered (with the current best bid and ask prices acting as a central reference), and the patterns in the individual limit order books are identified
-
Criticism as asynchronous collaboration: An example from social science research Stat (IF 2.451) Pub Date : 2022-03-04 Andrew Gelman
I discuss a published paper in political science that made a claim that aroused skepticism. The reanalysis is an example of how we, as consumers as well as producers of science, can engage with published work. This can be viewed as a sort of collaboration performed implicitly between the authors of a published paper and later researchers who want to understand or use the published work.
-
Model-Based Clustering of Semiparametric Temporal Exponential-Family Random Graph Models Stat (IF 2.451) Pub Date : 2022-01-20 Kevin H. Lee, Amal Agarwal, Anna Y. Zhang, Lingzhou Xue
Model-based clustering of time-evolving networks has emerged as one of the important research topics in statistical network analysis. It is a fundamental research question to model time-varying network parameters. However, due to difficulties in modeling functional network parameters, there is little progress in the current literature to model time-varying network parameters effectively. In this work
-
-
Changing presidential approval: Detecting and understanding change points in interval censored polling data Stat (IF 2.451) Pub Date : 2022-02-07 Jiahao Tian, Michael D. Porter
Understanding how a society views certain policies, politicians, and events can help shape public policy, legislation, and even a political candidate's campaign. This paper focuses on using aggregated, or interval censored, polling data to estimate the times when the public opinion shifts on the US president's job approval. The approval rate is modelled as a Poisson segmented (joinpoint) regression
-
A hierarchical meta-analysis for settings involving multiple outcomes across multiple cohorts Stat (IF 2.451) Pub Date : 2022-01-31 Tugba Akkaya Hocagil, Louise M Ryan, Richard J. Cook, Sandra W. Jacobson, Gale A. Richardson, Nancy L. Day, Claire D. Coles, Heather Carmichael Olson, Joseph L. Jacobson
Evidence from animal models and epidemiological studies has linked prenatal alcohol exposure (PAE) to a broad range of long-term cognitive and behavioural deficits. However, there is a paucity of evidence regarding the nature and levels of PAE associated with increased risk of clinically significant cognitive deficits. To derive robust and efficient estimates of the effects of PAE on cognitive function
-
Covariate-adaptive randomization with variable selection in clinical trials Stat (IF 2.451) Pub Date : 2022-01-26 Hao Zhang, Feifang Hu, Jianxin Yin
In clinical trials and causal inference, it is often critical to balance treatment allocation over influential covariates. In big data era, the number of covariates is usually very large, among which only a small fraction of them are influential to the response variable due to sparsity. However, existing studies assume that all influential covariates are known, fixed and given. In this article, we
-
Dissecting the 2015 Chinese stock market crash Stat (IF 2.451) Pub Date : 2022-01-23 Min Shu, Wei Zhu
We perform a novel analysis of the 2015 Chinese stock market crash by calibrating the log-periodic power law singularity (LPPLS) model to two important Chinese stock indices, SSEC and SZSC, from early 2014 to June 2015. Our analysis indicates that the LPPLS model can readily detect the bubble behaviour of the faster-than-exponential increase corrected by the accelerating logarithm-periodic oscillations
-
Model averaging-based sufficient dimension reduction Stat (IF 2.451) Pub Date : 2022-01-14 Min Cai, Ruige Zhuang, Zhou Yu, Ping Wu
Sufficient dimension reduction is intended to project high-dimensional predictors onto a low-dimensional space without loss of information on the responses. Classical methods, such as sliced inverse regression, sliced average variance estimation and directional regression, are backbones of many modern sufficient dimension methods and have gained considerable research interests. However, the efficiency
-
Equity-weighted bootstrapping: Examples and analysis Stat (IF 2.451) Pub Date : 2022-01-13 Harish S. Bhat, Majerle E. Reeves, Sidra Goldman-Mellor
When faced with severely imbalanced binary classification problems, we often train models on bootstrapped data in which the number of instances of each class occur in a more favorable ratio, often equal to one. We view algorithmic inequity through the lens of imbalanced classification: In order to balance the performance of a classifier across groups, we can bootstrap to achieve training sets that
-
Causal effect random forest of interaction trees for learning individualized treatment regimes with multiple treatments in observational studies Stat (IF 2.451) Pub Date : 2022-01-13 Luo Li, Richard A. Levine, Juanjuan Fan
Individuals may respond to treatments with significant heterogeneity. To optimize the treatment effect, it is necessary to recommend treatments based on individual characteristics. Existing methods in the literature for learning individualized treatment regimes are usually designed for randomized studies with binary treatments. In this study, we propose an algorithm to extend random forest of interaction
-
K-fold cross-validation for complex sample surveys Stat (IF 2.451) Pub Date : 2022-01-12 Jerzy Wieczorek, Cole Guerin, Thomas McMahon
Although K-fold cross-validation (CV) is widely used for model evaluation and selection, there has been limited understanding of how to perform CV for non-iid data, including those from sampling designs with unequal selection probabilities. We introduce CV methodology that is appropriate for design-based inference from complex survey sampling designs. For such data, we claim that we will tend to make
-
Improving image classification robustness using self-supervision Stat (IF 2.451) Pub Date : 2022-01-12 Ladyna Wittscher, Jan Diers, Christian Pigorsch
Self-supervised learning allows training of neural networks without immense, high-quality or labelled data sets. We demonstrate that self-supervision furthermore improves robustness of models using small, imbalanced or incomplete data sets which pose severe difficulties to supervised models. For small data sets, the accuracy of our approach is up to 12.5% higher using MNIST and 15.2% using Fashion-MNIST
-
A note on the convergence of lift zonoids of measures Stat (IF 2.451) Pub Date : 2022-01-09 František Hendrych, Stanislav Nagy
The lift zonoid is a convenient representation of an integrable measure by a convex set in a higher-dimensional space. It is known that, under appropriate conditions, a uniformly integrable sequence of measures converges weakly if and only if the corresponding sequence of lift zonoids converges in the Hausdorff metric. We provide a new proof of this essential result. Our proof technique allows us to
-
Content and computing outline of two undergraduate Bayesian courses: Tools, examples, and recommendations Stat (IF 2.451) Pub Date : 2022-01-06 Jingchen Hu, Mine Dogucu
Undergraduate Bayesian education is an area that has started getting attention lately. As many educational innovations and articles are published and increasingly more teaching and learning materials are shared, statistics educators might be interested in incorporating Bayesian statistics in their undergraduate statistics and data science curriculum. In this paper, we share a succinct overview of two
-
Chain rules for multivariate cumulant coefficients Stat (IF 2.451) Pub Date : 2022-01-06 Christopher S. Withers, Saralees Nadarajah
Cumulant coefficients are the building blocks of Edgeworth expansions and other analytic methods for statistical inference, such as bias reduction. Suppose that 𝐰ˆw^ is a standard estimate of an unknown vector w and that t(·) is a smooth function on w. We show that 𝐭ˆ=𝐭(𝐰ˆ)t^=tw^ is a standard estimate of t(w). We give its cumulant coefficients in terms of those of 𝐰ˆw^ and the derivatives of
-
Sparse Bayesian predictive modelling of tumour response using radiomic features Stat (IF 2.451) Pub Date : 2022-01-04 Shirin Golchi, Jingyan Fu, Xiaoyang Liu, Eugene Yu, Reza Forghani, Sahir Bhatnagar
We propose a sparse Bayesian hierarchical model for the analysis of data including radiomic features for characterization of head and neck squamous cell carcinoma. The proposed model facilitates radiomic feature selection, handling of missing values in key predictors as well as prediction in a unified framework. The fully Bayesian approach enables adequate incorporation of uncertainty arising from
-
Becoming a JEDI statistician Stat (IF 2.451) Pub Date : 2021-12-28 Stephen T. Ziliak
JEDI stands for justice, equity, diversity and inclusion. JEDI is a global movement, with networks connecting academic, business and grass roots organizations. A definition of ‘JEDI statistics’ and ‘impermissible inequality’ is proposed and illustrated with stories from government work, university teaching and academic research regarding race, ethics and social justice in statistics. I recently had
-
Content-adjusted tolerance intervals via bootstrap calibration Stat (IF 2.451) Pub Date : 2021-12-28 Junjun Jiao, Xu Zhao, Weihu Cheng
Tolerance intervals (TIs) are commonly employed in numerous industries, ranging from engineering to pharmaceuticals. However, closed-form TIs are unavailable for most distributions. Although some approximate methods can be used to obtain TIs, coverage probabilities (CPs) of these TIs cannot achieve the nominal level, or can be even far different from the nominal level. In this study, we propose two
-
Adapting conditional simulation using circulant embedding for irregularly spaced spatial data Stat (IF 2.451) Pub Date : 2021-12-22 Maggie D. Bailey, Soutir Bandyopadhyay, Douglas Nychka
Computing an ensemble of random fields using conditional simulation is an ideal method for retrieving accurate estimates of a field conditioned on available data and for quantifying the uncertainty of these realizations. Methods for generating random realizations, however, are computationally demanding, especially when the estimates are conditioned on numerous observed data and for large domains. In
-
Orthogonal designs with branching and nested factors Stat (IF 2.451) Pub Date : 2021-12-22 Qiao Wei, Min-Qian Liu, Jian-Feng Yang
Computer experiments with branching and nested factors are a common class of computer experiments, but it is challenging to construct designs for this type of experiments. In this paper, we define a special type of design called branching orthogonal Latin hypercube design (BOLHD). Such a design has an appealing structure, that is, no matter at each level of a branching factor or the level-combination
-
Bayesian optimality and intervals for Stein-type estimates Stat (IF 2.451) Pub Date : 2021-12-13 Lingbo Ye, Kenneth Rice
We provide a novel Bayesian decision-theoretic motivation for Stein-type estimates, producing them as an adaptive choice between standard point estimation and estimation that rewards proximity to the origin. Unlike conventional approaches, our arguments provide shrunken estimates under any sampling model or prior. They also lead naturally to a form of credible interval, describing uncertainty about
-
Graph sampling by lagged random walk Stat (IF 2.451) Pub Date : 2021-12-10 Li-Chun Zhang
We propose a family of lagged random walk sampling methods in simple undirected graphs, where transition to the next state (i.e., node) depends on both the current and previous states—hence, lagged. The existing random walk sampling methods can be incorporated as special cases. We develop a novel approach to estimation based on lagged random walks at equilibrium, where the target parameter can be any
-
Some non-parametric regression models for interval-valued functional data Stat (IF 2.451) Pub Date : 2021-12-08 Roya Nasirzadeh, Fariba Nasirzadeh, Zohreh Mohammadi
The interval-valued functional data belong to the category of symbolic data in which the values of each observation consist of two functions, a lower limit function and an upper limit function. This study introduces some functional non-parametric approaches to fit a functional regression model on interval-valued functional data. First, using kernel functions, two non-parametric regression models are
-
Allocation of COVID-19 testing budget on a commute network of counties Stat (IF 2.451) Pub Date : 2021-12-06 Yaxuan Huang, Zheng Tracy Ke, Jiashun Jin
The screening testing is an effective tool to control the early spread of an infectious disease such as COVID-19. When the total testing capacity is limited, we aim to optimally allocate testing resources among n counties. We build a (weighted) commute network on counties, with the weight between two counties a decreasing function of their traffic distance. We introduce a network-based disease model
-
A unified approach for outliers and influential data detection: The value of information in retrospect Stat (IF 2.451) Pub Date : 2021-12-06 Jacob Parsons, Le Bao
Identifying influential and outlying data is important as it would guide the effective collection of future data and the proper use of existing information. We develop a unified approach for outlier detection and influence analysis. Our proposed method is grounded in the intuitive value of information concepts and has a distinct advantage in interpretability and flexibility when compared to existing
-
Stationary distribution and ergodicity of stochastic Wright's mutualism model with regime-switching Stat (IF 2.451) Pub Date : 2021-11-30 Xu Li
We formulate a stochastic Wright's mutualism model with regime-switching in this paper. Under some conditions, we testify that the model admits a unique stationary distribution which is ergodic, and the transition probability of the solution of the model exponentially converges to the stationary distribution. Some recent results are extended and improved greatly. An example and several simulations
-
Statistical learning for predicting density–matrix-based electron dynamics Stat (IF 2.451) Pub Date : 2021-11-25 Prachi Gupta, Harish S. Bhat, Karnamohit Ranka, Christine M. Isborn
We consider the problem of learning density-dependent molecular Hamiltonian matrices from time series of electron density matrices, all in the context of Hartree–Fock theory. Prior work developed a solution to this problem for small molecular systems with density and Hamiltonian matrices of size at most 6 × 6. Here, using a battery of techniques, we scale prior methods to larger molecular systems with
-
Multilevel varying coefficient spatiotemporal model Stat (IF 2.451) Pub Date : 2021-11-19 Yihao Li, Danh V. Nguyen, Esra Kürüm, Connie M. Rhee, Sudipto Banerjee, Damla Şentürk
Over 785,000 individuals in the United States have end-stage renal disease (ESRD), with about 70% of patients on dialysis, a life-sustaining treatment. Dialysis patients experience frequent hospitalizations. In order to identify risk factors of hospitalizations, we utilize data from the large national database, United States Renal Data System (USRDS). To account for the hierarchical structure of the
-
A partial EM algorithm for model-based clustering with highly diverse missing data patterns Stat (IF 2.451) Pub Date : 2021-11-06 Ryan P. Browne, Paul D. McNicholas, Christopher J. Findlay
The expectation-maximization (EM) algorithm for incomplete data with highly diverse missing data patterns can be computationally expensive. A partial expectation-maximization (PEM) algorithm is developed to ease this computational burden. This PEM algorithm circumvents the need for a traditional E-step by performing a partial E-step that reduces the Kullback-Leibler divergence between the conditional
-
Statistical inference on group Rasch mixture network models Stat (IF 2.451) Pub Date : 2021-11-05 Yuhang Long, Tao Huang
In a two-mode network, the nodes are divided into two types (primary nodes and secondary nodes), and connections exist only between nodes of different types. In reality, in such a two-mode network, one-mode network connections may also exist among primary nodes, and these two kinds of networks are usually not independent and coexistent. In this paper, we first propose a group Rasch mixture network
-
Sparse multivariate functional principal component analysis Stat (IF 2.451) Pub Date : 2021-11-01 Jun Song, Kyongwon Kim
We introduce a sparse multivariate functional principal component analysis method by incorporating ideas from the group sparse maximum variance method to multivariate functional data. Our method can avoid the “curse of dimensionality” from a high-dimensional dataset and enjoy interpretability at the same time. In particular, our unsupervised method can capture important latent factors to explain variability
-
Confidence intervals that utilize sparsity Stat (IF 2.451) Pub Date : 2021-10-20 Paul Kabaila, David Farchione
We consider a linear regression model with orthogonal regressors and Gaussian random errors with known variance, in the low-dimensional setting that the length of the regression parameter vector does not exceed the length of the response vector. We suppose that we have uncertain prior information that sparsity holds, that is, that many of the components of the regression parameter vector are zero.
-
Valid two-sample graph testing via optimal transport Procrustes and multiscale graph correlation with applications in connectomics Stat (IF 2.451) Pub Date : 2021-10-19 Jaewon Chung, Bijan Varjavand, Jesús Arroyo-Relión, Anton Alyakin, Joshua Agterberg, Minh Tang, Carey E. Priebe, Joshua T. Vogelstein
Testing whether two graphs come from the same distribution is of interest in many real-world scenarios, including brain network analysis. Under the random dot product graph model, the nonparametric hypothesis testing framework consists of embedding the graphs using the adjacency spectral embedding (ASE), followed by aligning the embeddings using the median flip heuristic and finally applying the nonparametric