Research on privacy protection of multi source data based on improved gbdt federated ensemble method with different metrics,Physical Communication

当前位置： X-MOL 学术 › Phys. Commun. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Research on privacy protection of multi source data based on improved gbdt federated ensemble method with different metrics
Physical Communication ( IF 2.2 ) Pub Date : 2021-08-31 , DOI: 10.1016/j.phycom.2021.101347
Changyin Luo _{1,

2,

3} , Xuebin Chen _{1,

2,

3} , Jingcheng Xu ₁ , Shufen Zhang _{1,

2,

3}

Affiliation

Federated learning is a hot topic in the field of multi-source data. It has the advantage that the data cannot be localized, and it has the problems that the parameters of the local model are difficult to integrate and the accuracy of the model is low. In this paper, an improved gradient boosting decision tree (GBDT) federated ensemble learning method is proposed, which takes the average gradient of similar samples and its own gradient as a new gradient to improve the accuracy of the local model. Different ensemble learning methods are used to integrate the parameters of the local model, thus improving the accuracy of the updated global model. The experimental results show that the accuracy of the global model trained by the method on magic-gamma-telescope datasets and eeg-eye-state datasets is compared with that of GBDT federated ensemble method and traditional multi-source data processing technology without finding similar samples. The accuracy of the model is improved by 0.6%, 0.45% and 0.27% and 0.08% respectively, which makes it possible to improve the security of data and model while improving the accuracy of the model.

中文翻译：

基于不同度量的改进gbdt联合集成方法的多源数据隐私保护研究

联邦学习是多源数据领域的热门话题。其优点是数据无法定位，同时存在局部模型参数难以整合、模型精度低等问题。本文提出了一种改进的梯度提升决策树（GBDT）联邦集成学习方法，将相似样本的平均梯度和自身的梯度作为新的梯度来提高局部模型的精度。使用不同的集成学习方法对局部模型的参数进行整合，从而提高更新后的全局模型的准确性。实验结果表明，该方法在magic-gamma-telescope数据集和eeg-eye-state数据集上训练的全局模型的准确性与GBDT联邦集成方法和传统多源数据处理技术相比，没有找到相似的样本. 模型的精度分别提高了0.6%、0.45%和0.27%、0.08%，使得在提高模型精度的同时提高数据和模型的安全性成为可能。

更新日期：2021-09-15

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>