当前位置: X-MOL 学术J. Comput. Graph. Stat. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Logistic Regression Models for Aggregated Data
Journal of Computational and Graphical Statistics ( IF 1.4 ) Pub Date : 2021-04-20 , DOI: 10.1080/10618600.2021.1895816
T. Whitaker 1 , B. Beranger 1 , S. A. Sisson 1
Affiliation  

Abstract

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However, inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from symbolic data analysis to summarize the collection of predictor variables into histogram form, and perform inference on this summary dataset. We develop ideas based on composite likelihoods to derive an efficient one-versus-rest approximate composite likelihood model for histogram-based random variables, constructed from low-dimensional marginal histograms obtained from the full histogram. We demonstrate that this procedure can achieve comparable classification rates to the standard full data multinomial analysis and against state-of-the-art subsampling algorithms for logistic regression, but at a substantially lower computational cost. Performance is explored through simulated examples, and analyses of large supersymmetry and satellite crop classification datasets. Supplementary materials for this article are available online.



中文翻译:

聚合数据的逻辑回归模型

摘要

逻辑回归模型是一种流行且有效的预测分类响应数据概率的方法。然而,对于大型数据集,这些模型的推理在计算上可能变得令人望而却步。在这里,我们采用符号数据分析的思想,将预测变量的集合汇总为直方图形式,并对这个汇总数据集进行推理。我们开发了基于复合似然的想法,为基于直方图的随机变量推导出一个有效的一对静止近似复合似然模型,该模型由从完整直方图获得的低维边缘直方图构建。我们证明该程序可以实现与标准全数据多项式分析相当的分类率,并与最先进的逻辑回归子采样算法相比,但计算成本大大降低。通过模拟示例以及对大型超对称和卫星作物分类数据集的分析来探索性能。本文的补充材料可在线获取。

更新日期:2021-04-20
down
wechat
bug