样式: 排序: IF: - GO 导出 标记为已读
-
Online learning for streaming data classification in nonstationary environments Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-03-09 Yujie Gai, Kang Meng, Xiaodi Wang
In this article, we implement the classification of nonstationary streaming data. Due to the inability to obtain full data in the context of streaming data, we adopt a strategy based on clustering structure for data classification. Specifically, this strategy involves dynamically maintaining clustering structures to update the model, thereby updating the objective function for classification. Simultaneously
-
Error‐controlled feature selection for ultrahigh‐dimensional and highly correlated feature space using deep learning Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-03-05 Arkaprabha Ganguli, Tapabrata Maiti, David Todem
Deep learning has been at the center of analytics in recent years due to its impressive empirical success in analyzing complex data objects. Despite this success, most existing tools behave like black‐box machines, thus the increasing interest in interpretable, reliable, and robust deep learning models applicable to a broad class of applications. Feature‐selected deep learning has emerged as a promising
-
Marginal clustered multistate models for longitudinal progressive processes with informative cluster size Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-03-04 Sean Xinyang Feng, Aya A. Mitani
Informative cluster size (ICS) is a phenomenon where cluster size is related to the outcome. While multistate models can be applied to characterize the unit‐level transition process for clustered interval‐censored data, there is a research gap addressing ICS within this framework. We propose two extensions of multistate model that account for ICS to make marginal inference: one by incorporating within‐cluster
-
-
A novel two‐step extrapolation‐insertion risk model based on the Expectile under the Pareto‐type distribution Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-02-21 Ziwen Geng
The catastrophe loss model developed is a challenging problem in the insurance industry. In the context of Pareto‐type distribution, measuring risk at the extreme right tail has become a major focus for academic research. The quantile and Expectile of distribution are found to be useful descriptors of its tail, in the same way as the median and mean are related to its central behavior. In this article
-
Bayesian inference for nonprobability samples with nonignorable missingness Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-02-21 Zhan Liu, Xuesong Chen, Ruohan Li, Lanbao Hou
Nonprobability samples, especially web survey data, have been available in many different fields. However, nonprobability samples suffer from selection bias, which will yield biased estimates. Moreover, missingness, especially nonignorable missingness, may also be encountered in nonprobability samples. Thus, it is a challenging task to make inference from nonprobability samples with nonignorable missingness
-
Modeling matrix variate time series via hidden Markov models with skewed emissions Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-02-20 Michael P. B. Gallaugher, Xuwen Zhu
Data collected today have increasingly become more complex and cannot be analyzed using regular statistical methods. Matrix variate time series data is one such example where the observations in the time series are matrices. Herein, we introduce a set of three hidden Markov models using skewed matrix variate emission distributions for modeling matrix variate time series data. Compared to the hidden
-
A deep learning approach for the comparison of handwritten documents using latent feature vectors Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-02-07 Juhyeon Kim, Soyoung Park, Alicia Carriquiry
Forensic questioned document examiners still largely rely on visual assessments and expert judgment to determine the provenance of a handwritten document. Here, we propose a novel approach to objectively compare two handwritten documents using a deep learning algorithm. First, we implement a bootstrapping technique to segment document data into smaller units, as a means to enhance the efficiency of
-
Subsampling under distributional constraints Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-02-09 Florian Combes, Ricardo Fraiman, Badih Ghattas
Some complex models are frequently employed to describe physical and mechanical phenomena. In this setting, we have an input X$$ X $$ in a general space, and an output Y=f(X)$$ Y=f(X) $$ where f$$ f $$ is a very complicated function, whose computational cost for every new input is very high, and may be also very expensive. We are given two sets of observations of X$$ X $$, S1$$ {S}_1 $$ and S2$$ {S}_2
-
Rarity updated ensemble with oversampling: An ensemble approach to classification of imbalanced data streams Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-02-09 Zahra Nouri, Vahid Kiani, Hamid Fadishei
Today's ever-increasing generation of streaming data demands novel data mining approaches tailored to mining dynamic data streams. Data streams are non-static in nature, continuously generated, and endless. They often suffer from class imbalance and undergo temporal drift. To address the classification of consecutive data instances within imbalanced data streams, this research introduces a new ensemble
-
Sparse Bayesian variable selection in high-dimensional logistic regression models with correlated priors Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-01-30 Zhuanzhuan Ma, Zifei Han, Souparno Ghosh, Liucang Wu, Min Wang
In this paper, we propose a sparse Bayesian procedure with global and local (GL) shrinkage priors for the problems of variable selection and classification in high-dimensional logistic regression models. In particular, we consider two types of GL shrinkage priors for the regression coefficients, the horseshoe (HS) prior and the normal-gamma (NG) prior, and then specify a correlated prior for the binary
-
Considerations in Bayesian agent-based modeling for the analysis of COVID-19 data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-01-25 Seungha Um, Samrachana Adhikari
Agent-based model (ABM) has been widely used to study infectious disease transmission by simulating behaviors and interactions of autonomous individuals called agents. In the ABM, agent states, for example infected or susceptible, are assigned according to a set of simple rules, and a complex dynamics of disease transmission is described by the collective states of agents over time. Despite the flexibility
-
Imputed quantile vector autoregressive model for multivariate spatial–temporal data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-01-25 Liang Jinwen, Tian Maozai
Imputing missing values in multivariate spatial–temporal data is important in many fields. Existing low rank tensor learning methods are popular for handling this task but are sensitive to high level of skewness. The aim of this paper is to develop an alternative method with robustness and high imputation accuracy for multivariate spatial–temporal data. In view of the fact that quantile regression
-
Study of a bounded interval perks distribution with quantile regression analysis Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-01-25 Laila A. Al-Essa, Shakaiba Shafiq, Deniz Ozonur, Farrukh Jamal
In this article, a novel bounded interval model called the unit-Perks model is developed by suitably transforming the positive random variable of the Perks distribution. Numerous statistical features of the bounded interval Perks model are being explored based on the expansion of the density function. Eight distinct estimation approaches are being used to estimate the parameters of the unit-Perks model
-
Nonparametric Bayesian functional clustering with applications to racial disparities in breast cancer Stat. Anal. Data Min. (IF 1.3) Pub Date : 2024-01-25 Wenyu Gao, Inyoung Kim, Wonil Nam, Xiang Ren, Wei Zhou, Masoud Agah
As we have easier access to massive data sets, functional analyses have gained more interest. However, such data sets often contain large heterogeneities, noises, and dimensionalities. When generalizing the analyses from vectors to functions, classical methods might not work directly. This paper considers noisy information reduction in functional analyses from two perspectives: functional clustering
-
Boosting diversity in regression ensembles Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-12-30 Mathias Bourel, Jairo Cugliari, Yannig Goude, Jean-Michel Poggi
Ensemble methods, such as Bagging, Boosting, or Random Forests, often enhance the prediction performance of single learners on both classification and regression tasks. In the context of regression, we propose a gradient boosting-based algorithm incorporating a diversity term with the aim of constructing different learners that enrich the ensemble while achieving a trade-off of some individual optimality
-
Multivariate contaminated normal mixture regression modeling of longitudinal data based on joint mean-covariance model Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-12-22 Niu Xiaoyu, Tian Yuzhu, Tang Manlai, Tian Maozai
Outliers are common in longitudinal data analysis, and the multivariate contaminated normal (MCN) distribution in model-based clustering is often used to detect outliers and provide robust parameter estimates in each subgroup. In this paper, we propose a method, the mixture of MCN (MCNM), based on the joint mean-covariance model, specifically designed to analyze longitudinal data characterized by mild
-
A machine learning oracle for parameter estimation Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-12-09 Lucas Koepke, Mary Gregg, Michael Frey
Competing procedures, involving data smoothing, weighting, imputation, outlier removal, etc., may be available to prepare data for parametric model estimation. Often, however, little is known about the best choice of preparatory procedure for the planned estimation and the observed data. A machine learning-based decision rule, an “oracle,” can be constructed in such cases to decide the best procedure
-
The generalized hyperbolic family and automatic model selection through the multiple-choice LASSO Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-12-08 Luca Bagnato, Alessio Farcomeni, Antonio Punzo
We revisit the generalized hyperbolic (GH) distribution and its nested models. These include widely used parametric choices like the multivariate normal, skew-t$$ t $$, Laplace, and several others. We also introduce the multiple-choice LASSO, a novel penalized method for choosing among alternative constraints on the same parameter. A hierarchical multiple-choice Least Absolute Shrinkage and Selection
-
Spatially-correlated time series clustering using location-dependent Dirichlet process mixture model Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-11-22 Junsub Jung, Sungil Kim, Heeyoung Kim
The Dirichlet process mixture (DPM) model has been widely used as a Bayesian nonparametric model for clustering. However, the exchangeability assumption of the Dirichlet process is not valid for clustering spatially correlated time series as these data are indexed spatially and temporally. While analyzing spatially correlated time series, correlations between observations at proximal times and locations
-
Modeling subpopulations for hierarchically structured data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-11-22 Andrew Simpson, Semhar Michael, Dylan Borchert, Christopher Saunders, Larry Tang
The field of forensic statistics offers a unique hierarchical data structure in which a population is composed of several subpopulations of sources and a sample is collected from each source. This subpopulation structure creates an additional layer of complexity. Hence, the data has a hierarchical structure in addition to the existence of underlying subpopulations. Finite mixtures are known for modeling
-
Input-response space-filling designs incorporating response uncertainty Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-11-20 Xiankui Yang, Lu Lu, Christine M. Anderson-Cook
Traditionally space-filling designs have focused on the characteristics of the design in the input space ensuring uniform spread throughout the region. Input-response space-filling designs considered scenarios when having good spread throughout the range or region of the responses is also of interest. This paper acknowledges that there is typically uncertainty associated with the values of the response(s)
-
Driving mode analysis—How uncertain functional inputs propagate to an output Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-10-06 Scott A. Vander Wiel, Michael J. Grosskopf, Isaac J. Michaud, Denise Neudecker
Driving mode analysis elucidates how correlated features of uncertain functional inputs jointly propagate to produce uncertainty in the output of a computation. Uncertain input functions are decomposed into three terms: the mean functions, a zero-mean driving mode, and zero-mean residual. The random driving mode varies along a single direction, having fixed functional shape and random scale. It is
-
Stratified learning: A general-purpose statistical method for improved learning under covariate shift Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-09-29 Maximilian Autenrieth, David A. van Dyk, Roberto Trotta, David C. Stenning
We propose a simple, statistically principled, and theoretically justified method to improve supervised learning when the training set is not representative, a situation known as covariate shift. We build upon a well-established methodology in causal inference and show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. In practice, this is achieved
-
Residuals and diagnostics for multinomial regression models Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-09-29 Eric A. E. Gerber, Bruce A. Craig
In this paper, we extend the concept of a randomized quantile residual to multinomial regression models. Customary diagnostics for these models are limited because they involve difficult-to-interpret residuals and often focus on the fit of one category versus the rest. Our residuals account for associations between categories by using the squared Mahalanobis distances of the observed log-odds relative
-
On difference-based gradient estimation in nonparametric regression Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-09-16 Maoyu Zhang, Wenlin Dai
We propose a framework to directly estimate the gradient in multivariate nonparametric regression models that bypasses fitting the regression function. Specifically, we construct the estimator as a linear combination of adjacent observations with the coefficients from a vector-valued difference sequence, so it is more flexible than existing methods. Under the equidistant designs, closed-form solutions
-
Confidence bounds for threshold similarity graph in random variable network Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-09-07 P. A. Koldanov, A. P. Koldanov, D. P. Semenov
Problem of uncertainty of graph structure identification in random variable network is considered. An approach for the construction of upper and lower confidence bounds for graph structures is developed. This approach is applied for the construction of upper and lower confidence bounds for the threshold similarity graph. The stability of confidence bounds and gaps between upper and lower confidence
-
An Improved D2GAN-based oversampling algorithm for imbalanced data classification Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-08-25 Xiaoqiang Zhao, Qinglei Yao
To address the problems of pattern collapse, uncontrollable data generation and high overlap rate when generative adversarial network (GAN) oversamples imbalanced data, we propose an imbalanced data oversampling algorithm based on improved dual discriminator generative adversarial nets (D2GAN). First, we integrate the positive class attribute information into the generator and the discriminator to
-
A neutral zone classifier for three classes with an application to text mining Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-08-21 Dylan C. Friel, Yunzhe Li, Benjamin Ellis, Daniel R. Jeske, Herbert K. H. Lee, Philip H. Kass
A classifier may be limited by its conditional misclassification rates more than its overall misclassification rate. In the case that one or more of the conditional misclassification rates are high, a neutral zone may be introduced to decrease and possibly balance the misclassification rates. In this paper, a neutral zone is incorporated into a three-class classifier with its region determined by controlling
-
Ensemble learning for score likelihood ratios under the common source problem Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-08-04 Federico Veneri, Danica M. Ommen
Machine learning-based score likelihood ratios (SLRs) have emerged as alternatives to traditional likelihood ratios and Bayes factors to quantify the value of evidence when contrasting two opposing propositions. When developing a conventional statistical model is infeasible, machine learning can be used to construct a (dis)similarity score for complex data and estimate the ratio of the conditional
-
Nonparametric clustering of RNA-sequencing data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-07-30 Gabriel Lozano, Nadia Atallah, Michael Levine
Identification of clusters of co-expressed genes in transcriptomic data is a difficult task. Most algorithms used for this purpose can be classified into two broad categories: distance-based or model-based approaches. Distance-based approaches typically utilize a distance function between pairs of data objects and group similar objects together into clusters. Model-based approaches are based on using
-
A finely tuned deep transfer learning algorithm to compare outsole images Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-07-28 Moonsoo Jang, Soyoung Park, Alicia Carriquiry
In forensic practice, evaluating shoeprint evidence is challenging because the differences between images of two different outsoles can be subtle. In this paper, we propose a deep transfer learning-based matching algorithm called the Shoe-MS algorithm that quantifies the similarity between two outsole images. The Shoe-MS algorithm consists of a Siamese neural network for two input images followed by
-
Traditional kriging versus modern Gaussian processes for large-scale mining data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-07-21 Ryan B. Christianson, Ryan M. Pollyea, Robert B. Gramacy
The canonical technique for nonlinear modeling of spatial/point-referenced data is known as kriging in geostatistics, and as Gaussian Process (GP) regression for surrogate modeling and statistical learning. This article reviews many similarities shared between kriging and GPs, but also highlights some important differences. One is that GPs impose a process that can be used to automate kernel/variogram
-
A deep learning factor analysis model based on importance-weighted variational inference and normalizing flow priors: Evaluation within a set of multidimensional performance assessments in youth elite soccer players Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-06-30 Pascal Kilian, Daniel Leyhr, Christopher J. Urban, Oliver Höner, Augustin Kelava
Exploratory factor analysis is a widely used framework in the social and behavioral sciences. Since measurement errors are always present in human behavior data, latent factors, generating the observed data, are important to identify. While most factor analysis methods rely on linear relationships in the data-generating process, deep learning models can provide more flexible modeling approaches. However
-
Categorical classifiers in multiclass classification with imbalanced datasets Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-05-18 Maurizio Carpita, Silvia Golia
This paper discusses, in a multiclass classification setting, the issue of the choice of the so-called categorical classifier, which is the procedure or criterion that transforms the probabilities produced by a probabilistic classifier into a single category or class. The standard choice is the Bayes Classifier (BC), but it has some limits with rare classes. This paper studies the classification performance
-
A new parametric approach to gender gap with application to EUSILC data in Poland and Italy Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-05-15 Francesca Greselin, Alina Jȩdrzejczak, Kamila Trzcińska
Real income distribution comparisons are of interest to policy makers across European countries. Nowadays, a crucial component of income inequality remains the discrepancy between men and women, often called the gender gap. Since the gender gap is related to the whole distribution of incomes in a population, popular single metrics are not adequate, and previous studies applied the relative distribution
-
Semiparametric detection of changepoints in location, scale, and copula Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-04-29 Gaurav Agarwal, Idris A. Eckley, Paul Fearnhead
This paper proposes a new method to detect changepoints in the location and scale of univariate data sequences. The proposed method assumes that the data belong to the location-scale family of distributions and estimate the associated densities nonparametrically. Specifically, the approach does not require knowledge of the functional form of the distribution of the data sequence. As such, the approach
-
A new formulation of sparse multiple kernel k$$ k $$-means clustering and its applications Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-04-18 Wentao Qu, Xianchao Xiu, Jun Sun, Lingchen Kong
Multiple kernel k $$ k $$ -means (MKKM) clustering has been an important research topic in statistical machine learning and data mining over the last few decades. MKKM combines a group of prespecified base kernels to improve the clustering performance. Although many efforts have been made to improve the performance of MKKM further, the present works do not sufficiently consider the potential structure
-
Association rules and decision rules Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-04-14 Abdelkader Mokkadem, Mariane Pelletier, Louis Raimbault
Determining association rules of significant interest is an essential task within data mining and statistical analysis. In this paper, we first precisely define the notion of association rule. For this, we introduce a general model, which includes the usual transaction model, and which allows many operations on the association rules. Then, we interpret association rules as statistical decision rules
-
Simplicial depth: Characterization and reconstruction Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-04-12 Petra Laketa, Stanislav Nagy
Statistical depth functions have been designed with the intention of extending nonparametric inference toward multivariate setups. As such, the depths should serve as multivariate analogues of the quantile functions known from the analysis of real-valued data. The so-called characterization and reconstruction questions are among the fundamental open problems of the contemporary depth research. Roughly
-
Share density-based clustering of income data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-04-03 Francesca Condino
The Lorenz curve is a fundamental tool for analyzing income and wealth distribution and inequality. Indeed, the Lorenz curve and its derivative, the so-called share density, provide valuable information regarding inequality. There is a widely recognized connection between the Lorenz curve and elements from information theory field. Starting from this evidence, the aim of this work is to compare the
-
Buckley–James estimation of generalized additive accelerated lifetime model with ultrahigh-dimensional data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-02-22 Zichang Li, Xuejing Zhao
High-dimensional covariates in lifetime data is a challenge in survival analysis, especially in gene expression profile. The objective of this paper is to propose an efficient algorithm to extend the generalized additive model to survival data with high-dimensional covariates. The algorithm is combined of generalized additive (GAM) model and Buckley–James estimation, which makes a nonparametric extension
-
Lq regularization for fair artificial intelligence robust to covariate shift Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-02-22 Seonghyeon Kim, Sara Kim, Kunwoong Kim, Yongdai Kim
It is well recognized that historical biases exist in training data against a certain sensitive group (e.g., non-White, women) which are socially unacceptable, and these unfair biases are inherited in trained artificial intelligence (AI) models. Various learning algorithms have been proposed to remove or alleviate unfair biases in trained AI models. In this paper, we consider another type of bias in
-
Doubly robust estimation for non-probability samples with modified intertwined probabilistic factors decoupling Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-02-18 Zhan Liu, Junbo Zheng, Yingli Pan
In recent years, non-probability samples, such as web survey samples, have become increasingly popular in many fields, but they may be subject to selection biases, which results in the difficulty for inference from them. Doubly robust (DR) estimation is one of the approaches to making inferences from non-probability samples. When many covariates are available, variable selection becomes important in
-
Adaptive boosting for ordinal target variables using neural networks Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-01-26 Insung Um, Geonseok Lee, Kichun Lee
Boosting has proven its superiority by increasing the diversity of base classifiers, mainly in various classification problems. In reality, target variables in classification often are formed by numerical variables, in possession of ordinal information. However, existing boosting algorithms for classification are unable to reflect such ordinal target variables, resulting in non-optimal solutions. In
-
Bilateral-Weighted Online Adaptive Isolation Forest for anomaly detection in streaming data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-01-14 Gábor Hannák, Gábor Horváth, Attila Kádár, Márk Dániel Szalai
We propose a method called Bilateral-Weighted Online Adaptive Isolation Forest (BWOAIF) for unsupervised anomaly detection based on Isolation Forest (IF), which is applicable to streaming data and able to cope with concept drift. Similar to IF, the proposed method has only few hyperparameters whose effect on the performance are easy to interpret by human intuition and therefore easy to tune. BWOAIF
-
Specifying composites in structural equation modeling: A refinement of the Henseler–Ogasawara specification Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-01-05 Xi Yu, Florian Schuberth, Jörg Henseler
Structural equation modeling (SEM) plays an important role in business and social science and so do composites, that is, linear combinations of variables. However, existing approaches to integrate composites into structural equation models still have limitations. A major leap forward has been the Henseler–Ogasawara (H–O) specification, which for the first time allows for seamlessly integrating composites
-
Robust deep neural network surrogate models with uncertainty quantification via adversarial training Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-01-04 Lixiang Zhang, Jia Li
Surrogate models have been used to emulate mathematical simulators of physical or biological processes for computational efficiency. High-speed simulation is crucial for conducting uncertainty quantification (UQ) when the simulation must repeat over many randomly sampled input points (aka the Monte Carlo method). A simulator can be so computationally intensive that UQ is only feasible with a surrogate
-
Model selection with bootstrap validation Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-01-04 Rafael Savvides, Jarmo Mäkelä, Kai Puolamäki
Model selection is one of the most central tasks in supervised learning. Validation set methods are the standard way to accomplish this task: models are trained on training data, and the model with the smallest loss on the validation data is selected. However, it is generally not obvious how much validation data is required to make a reliable selection, which is essential when labeled data are scarce
-
Hierarchy-assisted gene expression regulatory network analysis Stat. Anal. Data Min. (IF 1.3) Pub Date : 2023-01-04 Han Yan, Sanguo Zhang, Shuangge Ma
Gene expressions have been extensively studied in biomedical research. With gene expression, network analysis, which takes a system perspective and examines the interconnections among genes, has been established as highly important and meaningful. In the construction of gene expression networks, a commonly adopted technique is high-dimensional regularized regression. Network construction can be unadjusted
-
Semi-supervised multi-label learning with missing labels by exploiting feature-label correlations Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-12-31 Runxin Li, Xuefeng Zhao, Zhenhong Shang, Lianyin Jia
The majority of multi-learning techniques now in use presuppose that there will be enough labeled instances. But in real-world applications, it is frequently the case that only partial labels are included for each training instance. This is either because getting a fully labeled training set takes a lot of time and effort or because doing so is expensive. Multi-label learning with missing labels, on
-
Simplicial depth and its median: Selected properties and limitations Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-12-02 Stanislav Nagy
Depth functions are important tools of nonparametric statistics that extend orderings, ranks, and quantiles to the setup of multivariate data. We revisit the classical definition of the simplicial depth and explore its theoretical properties when evaluated with respect to datasets or measures that do not necessarily possess a symmetric density. Recent advances from discrete geometry are used to refine
-
Corrigendum Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-11-21
In Mohammadi [1], The authors regret to inform that the citation for the bds.m code written by Professor Ludwig Kanzler were not included to Table 1. The updated Table 1 note is shown below. Note: Tests are Lee, White, and Granger (LWG); Terasvirta, Lee, and Granger (TLG); Tsay; Brock, Dechert, and Sheinkman (BDS); MC Leod Li (MCLoLi); Ramsy; Keenan; and AutoRegressive Conditional Heteroscedasticity
-
Randomized algorithms for tensor response regression Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-11-21 Zhe Cheng, Xiangjian Xu, Zihao Song, Weihua Zhao
In this paper, we consider the estimation algorithm of tensor response on vector covariate regression model. Based on projection theory of tensor and the idea of randomized algorithm for tensor decomposition, three new algorithms named SHOLRR, RHOLRR and RSHOLRR are proposed under the low-rank Tucker decomposition and some theoretical analyses for two randomized algorithms are also provided. To explore
-
Cluster analysis via random partition distributions Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-11-12 David B. Dahl, Jacob Andros, J. Brandon Carter
Hierarchical and k-medoids clustering are deterministic clustering algorithms defined on pairwise distances. We use these same pairwise distances in a novel stochastic clustering procedure based on a probability distribution. We call our proposed method CaviarPD, a portmanteau from cluster analysis via random partition distributions. CaviarPD first samples clusterings from a distribution on partitions
-
Integrative learning of structured high-dimensional data from multiple datasets Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-11-08 Changgee Chang, Zongyu Dai, Jihwan Oh, Qi Long
Integrative learning of multiple datasets has the potential to mitigate the challenge of small n$$ n $$ and large p$$ p $$ that is often encountered in analysis of big biomedical data such as genomics data. Detection of weak yet important signals can be enhanced by jointly selecting features for all datasets. However, the set of important features may not always be the same across all datasets. Although
-
Local support vector machine based dimension reduction Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-10-17 Linxi Li, Qin Wang, Chenlu Ke
Motivated by several recent work that adopt support vector machines into the sufficient dimension reduction research, we propose a local support vector machine based dimension reduction approach. The proposal deals with continuous and binary responses, linear and nonlinear dimension reduction in a unified framework. The localization can also help relax the stringent probabilistic assumptions required
-
Frequentist model averaging for zero-inflated Poisson regression models Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-10-05 Jianhong Zhou, Alan T. K. Wan, Dalei Yu
This paper considers frequentist model averaging for estimating the unknown parameters of the zero-inflated Poisson regression model. Our proposed weight choice procedure is based on the minimization of an unbiased estimator of a conditional quadratic loss function. We prove that the resulting model average estimator enjoys optimal asymptotic property and improves finite sample properties over the
-
Evaluation and interpretation of driving risks: Automobile claim frequency modeling with telematics data Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-09-28 Yaqian Gao, Yifan Huang, Shengwang Meng
With the development of vehicle telematics and data mining technology, usage-based insurance (UBI) has aroused widespread interest from both academia and industry. The extensive driving behavior features make it possible to further understand the risks of insured vehicles, but pose challenges in the identification and interpretation of important ratemaking factors. This study, based on the telematics
-
Feature screening of ultrahigh dimensional longitudinal data based on the C-statistic Stat. Anal. Data Min. (IF 1.3) Pub Date : 2022-09-26 Peng Lai, Qing Di, Zhezi Shen, Yanqiu Zhou
This paper considers the feature screening method for the ultrahigh dimensional semiparametric linear models with longitudinal data. The C-statistic which measures the rank concordance between predictors and outcomes is generalized to the longitudinal data. On the basis of C-statistic and the score equation theory, we propose a feature screening method named LCSIS. Based on the smoothed technique and