Subdata selection algorithm for linear model discrimination,Statistical Papers

当前位置： X-MOL 学术 › Stat. Pap. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Subdata selection algorithm for linear model discrimination
Statistical Papers ( IF 1.3 ) Pub Date : 2022-03-03 , DOI: 10.1007/s00362-022-01299-8
Jun Yu ₁ , HaiYing Wang ₂

Affiliation

A statistical method is likely to be sub-optimal if the assumed model does not reflect the structure of the data at hand. For this reason, it is important to perform model selection before statistical analysis. However, selecting an appropriate model from a large candidate pool is usually computationally infeasible when faced with a massive data set, and little work has been done to study data selection for model selection. In this work, we propose a subdata selection method based on leverage scores which enables us to conduct the selection task on a small subdata set. Compared with existing subsampling methods, our method not only improves the probability of selecting the best model but also enhances the estimation efficiency. We justify this both theoretically and numerically. Several examples are presented to illustrate the proposed method.

中文翻译：

线性模型判别的子数据选择算法

如果假设的模型不能反映手头数据的结构，那么统计方法可能不是最佳的。因此，在统计分析之前进行模型选择很重要。然而，面对海量数据集时，从大型候选池中选择合适的模型通常在计算上是不可行的，并且很少有工作研究模型选择的数据选择。在这项工作中，我们提出了一种基于杠杆分数的子数据选择方法，这使我们能够在一个小的子数据集上进行选择任务。与现有的二次抽样方法相比，我们的方法不仅提高了选择最佳模型的概率，而且提高了估计效率。我们在理论上和数字上都证明了这一点。给出了几个例子来说明所提出的方法。

更新日期：2022-03-03

点击分享查看原文

点击收藏

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>