Automatic variable selection in a linear model on massive data,Communications in Statistics - Simulation and Computation

当前位置： X-MOL 学术 › Commun. Stat. Simul. Comput. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Automatic variable selection in a linear model on massive data
Communications in Statistics - Simulation and Computation ( IF 0.9 ) Pub Date : 2020-05-13 , DOI: 10.1080/03610918.2020.1752377
Gabriela Ciuperca ₁

Affiliation

Abstract

For a linear model on massive data, we propose an aggregated estimator depending on adaptive LASSO estimators. The proposed method allows the reduction of the data storage volume and the introduction of an aggregates estimator which automatically selects, with a probability converging to one, the significant explanatory variables. Moreover, the aggregated estimator, corresponding to the non null true parameters has the same asymptotic Normal law as the adaptive LASSO estimator on the all data. But, the estimator calculated on all data is practically impossible to calculate, for lack of calculation memory or storage, when the model is on massive data. Then, another interest of our method is that it can work around the data processing problem of insufficient memory allocated by statistical software when the observation number is very large. The empirical performance is investigated by a comparative simulation study. A real data example is used to illustrate the usefulness of our method.

中文翻译：

海量数据线性模型中的自动变量选择

摘要

对于海量数据的线性模型，我们提出了一个基于自适应 LASSO 估计器的聚合估计器。所提出的方法允许减少数据存储量并引入聚合估计器，该估计器自动选择具有收敛到一个概率的重要解释变量。此外，对应于非空真参数的聚合估计量与所有数据上的自适应 LASSO 估计量具有相同的渐近正态律。但是，当模型在海量数据上时，由于缺乏计算内存或存储，对所有数据计算的估计量实际上是不可能计算的。然后，我们方法的另一个有趣之处在于，它可以解决当观测数非常大时统计软件分配的内存不足的数据处理问题。通过比较模拟研究来研究经验性能。一个真实的数据示例用于说明我们方法的有用性。

更新日期：2020-05-13

点击分享查看原文

点击收藏

阅读更多本刊最新论文

全部期刊列表>>