当前位置: X-MOL 学术Ann. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Local uncertainty sampling for large-scale multiclass logistic regression
Annals of Statistics ( IF 3.2 ) Pub Date : 2020-06-01 , DOI: 10.1214/19-aos1867
Lei Han , Kean Ming Tan , Ting Yang , Tong Zhang

A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be handled by available computational resources. In this paper, we propose a general subsampling scheme for large-scale multi-class logistic regression and examine the variance of the resulting estimator. We show that asymptotically, the proposed method always achieves a smaller variance than that of the uniform random sampling. Moreover, when the classes are conditionally imbalanced, significant improvement over uniform sampling can be achieved. Empirical performance of the proposed method is compared to other methods on both simulated and real-world datasets, and these results match and confirm our theoretical analysis.

中文翻译:

大规模多类逻辑回归的局部不确定性抽样

大数据时代建立统计模型的一个主要挑战是可用数据量远远超过计算能力。解决此问题的常用方法是采用可由可用计算资源处理的子采样数据集。在本文中,我们为大规模多类逻辑回归提出了一种通用的子采样方案,并检查所得估计量的方差。我们表明,渐近地,所提出的方法总是比均匀随机采样的方差更小。此外,当类条件不平衡时,可以实现对均匀采样的显着改进。将所提出的方法的经验性能与模拟和真实世界数据集上的其他方法进行比较,
更新日期:2020-06-01
down
wechat
bug