Efficient fused learning for distributed imbalanced data,Communications in Statistics - Theory and Methods

当前位置： X-MOL 学术 › Commun. Stat. Theory Methods › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Efficient fused learning for distributed imbalanced data
Communications in Statistics - Theory and Methods ( IF 0.6 ) Pub Date : 2020-05-12 , DOI: 10.1080/03610926.2020.1759641
Jie Zhou ₁ , Guohao Shen ₂ , Xuan Chen ₁ , Yuanyuan Lin ₂

Affiliation

Abstract

Any data set exhibiting an unequal or highly-skewed distribution between its classes/categories can be regarded as imbalanced data. Due to privacy concern and other technical limitations, imbalanced data distributed across locations/machines cannot be simply combined and stored in a single central location. The commonly used naive averaging estimate may be unstable for imbalanced data. In this paper, we propose a fused estimation for logistic regression in analyzing distributed imbalanced data by combining all the cases available on all machines, which is stable and efficient. The consistency and asymptotic normality of the proposed estimator are established under regularity conditions. Asymptotic efficiency compared with the oracle estimator based on the entire imbalanced data is also studied. Extensive simulation studies show that the proposed estimator is as efficient as the oracle estimator in various situations. An application is illustrated with a credit card data for default payment.

中文翻译：

分布式不平衡数据的高效融合学习

摘要

任何在其类别/类别之间表现出不均等或高度偏态分布的数据集都可以被视为不平衡数据。由于隐私问题和其他技术限制，分布在位置/机器上的不平衡数据不能简单地组合并存储在单个中心位置。对于不平衡的数据，常用的朴素平均估计可能不稳定。在本文中，我们提出了一种逻辑回归的融合估计，通过结合所有机器上可用的所有案例来分析分布式不平衡数据，该方法稳定高效。所提出的估计量的一致性和渐近正态性是在正则条件下建立的。还研究了与基于整个不平衡数据的预言机估计器相比的渐近效率。广泛的模拟研究表明，所提出的估计器在各种情况下与预言机估计器一样有效。使用信用卡数据来说明默认付款的应用程序。

更新日期：2020-05-12

点击分享查看原文

点击收藏

阅读更多本刊最新论文