当前位置: X-MOL 学术J. Stat. Plann. Inference › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Information-based optimal subdata selection for big data logistic regression
Journal of Statistical Planning and Inference ( IF 0.9 ) Pub Date : 2020-12-01 , DOI: 10.1016/j.jspi.2020.03.004
Qianshun Cheng , HaiYing Wang , Min Yang

Abstract Technological advances have enabled an exponential growth in data volumes, and proven statistical methods are no longer applicable for extraordinary large data sets due to computational limitations. Subdata selection is an effective strategy to address this issue. In this study, we investigate existing sampling approaches and propose a novel framework of selecting subsets of data for logistic regression models. We show that, while the information contained in the subdata based on random sampling approaches is limited by the size of the subset, the information contained in the subdata based on the new framework increases as the size of the full data set increases. Performances of the proposed approach and those of other existing methods are compared under various criteria via extensive simulation studies.

中文翻译:

基于信息的大数据逻辑回归最优子数据选择

摘要 技术进步使数据量呈指数级增长,并且由于计算限制,经过验证的统计方法不再适用于超大数据集。子数据选择是解决此问题的有效策略。在这项研究中,我们调查了现有的抽样方法,并提出了一种为逻辑回归模型选择数据子集的新框架。我们表明,虽然基于随机抽样方法的子数据中包含的信息受到子集大小的限制,但基于新框架的子数据中包含的信息随着完整数据集大小的增加而增加。通过广泛的模拟研究,在各种标准下比较了所提出方法和其他现有方法的性能。
更新日期:2020-12-01
down
wechat
bug