一种非独立同分布问题下的联邦数据增强算法

doi:10.11896/jsjkx.220300031

计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 33-39.doi: 10.11896/jsjkx.220300031

一种非独立同分布问题下的联邦数据增强算法

瞿祥谋, 吴映波, 蒋晓玲

重庆大学大数据与软件学院重庆401331

收稿日期:2022-03-02 修回日期:2022-06-13 发布日期:2022-12-14
通讯作者: 吴映波(wyb@cqu.edu.cn)
作者简介:(201924021004@cqu.edu.cn)
基金资助:
国家重点研发计划(2019YFB1706101);重庆市技术创新与应用发展专项重点项目(cstc2019jscx-mbdxX0047);中央高校基本业务费项目(2020CDCGRJ50)

Federated Data Augmentation Algorithm for Non-independent and Identical Distributed Data

QU Xiang-mou, WU Ying-bo, JIANG Xiao-ling

School of Big Data & Software Engineering,Chongqing University,Chongqing 401331,China

Received:2022-03-02 Revised:2022-06-13 Published:2022-12-14
About author:QU Xiang-mou,born in 1998,postgra-duate,is a member of China Computer Federation.His main research interests include federated learning and data security.WU Ying-bo,born in 1983,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include machine learning,intelligent optimization and decision.
Supported by:
National Key R&D Program of China(2019YFB1706101), Science-Technology Foundation of Chongqing(cstc2019jscx-mbdxX0047) and Fundamental Research Funds for the Central Universities(2020CDCGRJ50).

摘要/Abstract

摘要： 在联邦学习中,由于用户的本地数据分布会随着用户所在地以及用户偏好而变动,数据的非独立同分布下的用户数据可能缺少某些标签类别的数据,在模型聚合中显著影响了迭代更新速率和最终的模型性能。为了解决这一问题,提出了一种基于条件生成对抗网络进行联邦数据增强的算法,能够在不涉及泄露用户隐私的前提下,通过生成对抗网络模型对数据偏斜的参与者扩增少量数据,大幅提升非独立同分布数据划分下联邦学习算法的性能。实验结果表明,与当前主流的联邦算法相比,该算法在非独立同分布设置下的MNIST,CIFAR-10数据集上的预测精度分别提升了1.18%和14.6%,显示出了该算法对非独立同分布问题的有效性和实用性。

关键词: 联邦学习, 隐私保护, 生成对抗网络, 差分隐私, 数据增强

Abstract: In federated learning,the local data distribution of users changes with the location and preferences of users,the data under the non-independent and identical distributed(Non-IID) data may lack data of some label categories,which significantly affects the update rate and the performance of the global model in federated aggregation.To solve this problem,a federated data augmentation based on conditional generative adversarial network(FDA-cGAN) algorithm is proposed,which can amplify data from participants with skewed data without compromising user privacy,and greatly improve the performance of the algorithm with Non-IID data.Experimental results show that,compared with the current mainstream federated average algorithm,under the Non-IID data setting,the prediction accuracy of MNIST and CIFAR-10 data sets improves by 1.18% and 14.6%,respectively,which demonstrates the effectiveness and practicability of the proposed algorithm for Non-IID data problems in federated learning.

Key words: Federated learning, Privacy-preserving, Generative adversarial network, Differential Privacy, Data augmentation

中图分类号:

TP391

瞿祥谋, 吴映波, 蒋晓玲. 一种非独立同分布问题下的联邦数据增强算法[J]. 计算机科学, 2022, 49(12): 33-39. https://doi.org/10.11896/jsjkx.220300031

QU Xiang-mou, WU Ying-bo, JIANG Xiao-ling. Federated Data Augmentation Algorithm for Non-independent and Identical Distributed Data[J]. Computer Science, 2022, 49(12): 33-39. https://doi.org/10.11896/jsjkx.220300031

参考文献

[1]MCMAHAN H B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[2]MCMAHAN H B,MOORE E,RAMAGE D,et al.Federatedlearning of deep networks using model averaging[J].arXiv:1602.05629,2016.
[3]YANG Q,LIU Y,CHEN T,et al.Federated machine learning:Concept and applications[J].ACM Transactions on Intelligent Systems and Technology,2019,10(2):1-19.
[4]JAKUB K,MCMAHAN H B,YU F X,et al.Federated lear-ning:Strategies for improving communication efficiency[J].ar-Xiv:1610.05492,2016.
[5]JAKUB K,MCMAHAN H B,DANIEL R,et al.Federated Optimization:Distributed Machine Learning for On-Device Intelligence[J].arXiv:1610.02527,2016.
[6]ZHAO Y,LI M,SUDA N,et al.Federated learning with non-iid data[J].arXiv:1806.00582,2018.
[7]BONAWITZ K,EICHNER H,GRIESKAMP W,et al.Towards federated learning at scale:System design[C]//Proceedings of Machine Learning and Systems.2019,1:374-388.
[8]LARIMIREDDY S P,KALE S,MOHRI M,et al.SCAFFOLD:Stochastic Controlled Averaging for On-Device Federated Lear-ning[C]//Proceedings of the International Conference on Machine Learning.PMLR,2020,119:5132-5143.
[9]LI X,HUANG K,YANG W,et al.On the convergence of fedavg on non-iid data[J] arXiv:1907.02189,2019.
[10]HSU T M H,QI H,BROWN M.Measuring the effects of non-identical data distribution for federated visual classification[J].arXiv:1909.06335,2019.
[11]LI T,SAHU A K,ZAHEER M,et al.Federated optimization in heterogeneous networks[C]//Proceedings of Machine Learning and Systems.2020:429-450.
[12]WANG J,LIU Q,LIANG H,et al.Tackling the objective inconsistency problem in heterogeneous federated optimization[C]//Advances in Neural Information Processing Systems.2020:7611-7623.
[13]KAIROUZ P,MCMAHAN H B,AVENT B,et al.Advances and openproblems in federated learning[J].Foundations and Trends in Machine Learning,2021,14(1／2):1-210.
[14]SATTLER F,WIREDEMANN S,MULLER KR,et al.Robust and communication-efficient federated learning from non-iid data[J].IEEE Transactions on Neural Networks and Learning Systems,2019,31(9):3400-3413.
[15]NISHIO T,YONETANI R.Client selection for federated lear-ning with heterogeneous resources in mobile edge[C]//International Conference on Communications(ICC).IEEE,2019:1-7.
[16]WANG L,WANG W,LI B.CMFL:Mitigating communication overhead for federated learning[C]//International Conference on Distributed Computing Systems(ICDCS).IEEE,2019:954-964.
[17]SMITH V,CHIANG C K,SANJABI M,et al.Federated multi-task learning[C]//Advances in Neural Information Processing Systems.2017.
[18]SATTLER F,MULLER K R,SAMEK W.Clustered federated learning:Model-agnostic distributed multitask optimization under privacy constraints[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(8):3710-3722.
[19]LI R,MA F,JIANG W,et al.Online federated multitask lear-ning[C]//International Conference on Big Data(Big Data).IEEE,2019:215-220.
[20]COLLINS L,HASSANI H,MOKHTARI A,et al.Exploiting shared representations for personalized federated learning[C]//International Conference on Machine Learning.PMLR,2021:2089-2099.
[21]PAN S J,YANG Q.A survey on transfer learning[J].IEEE Transactions on Knowledge and Data Engineering,2009,22(10):1345-1359.
[22]YANG H,HE H,ZHANG W,et al.FedSteg:A federated transfer learning framework for secure image steganalysis[J].IEEE Transactions on Network Science and Engineering,2020,8(2):1084-1094.
[23]LIU Y,KANG Y,XING C,et al.A secure federated transfer learning framework[J].IEEE Intelligent Systems,2020,35(4):70-82.
[24]XU M,LI X,WANG Y,et al.Privacy-preserving multisourcetransfer learning in intrusion detection system[J].Transactions on Emerging Telecommunications Technologies,2021,32(5):e3957.
[25]JING Q,WANG W,ZHANG J,et al.Quantifying the perfor-mance of federated transfer learning[J].arXiv:1912.12795,2019.
[26]SHARMA S,XING C,LIU Y.Secure and efficient federated transfer learning[C]//International Conference on Big Data(Big Data).IEEE,2019:2569-2576.
[27]WANG Z,SONG M,ZHANG Z,et al.Beyond inferring class representatives:User-level privacy leakage from federated lear-ning[C]//IEEE Conference on Computer Communications.IEEE,2019:2512-2520.
[28]SUN J,LI A,WANG B,et al.Soteria:Provable defense against privacy leakage in federated learning from representation perspective[C]//IEEE Conference on Computer Vision and Pattern Recognition.2021:9311-9319.
[29]GOODFELLOW I,POUGET-ABADIE J,MIRZA MEHDI,et al.Generative adversarial nets[C]//Advances in Neural Information Processing Systems.2014.
[30]DWORK C.Differential privacy:A survey of results[C]//International Conference on Theory and Applications of Models of Computation.Berlin:Springer,2008:1-19.
[31]LIU J,YIN S,LI H,et al.A Density-based Clustering Method for K-anonymity Privacy Protection[J].Journal of Information Hiding and Multimedia Signal Processing,2017,8(1):12-18.
[32]YANG Z,CHEN M,SAAD W,et al.Energy efficient federated learning over wireless communication networks[J].IEEE Transactions on Wireless Communications,2020,20(3):1935-1949.
[33]HAMER J,MOHRI M,SURESH A T.Fedboost:A communication-efficient algorithm for federated learning[C]//International Conference on Machine Learning.PMLR,2020:3973-3983.
[34]WAHAB O A,MOURAD A,OTROK H,et al.Federated machine learning:Survey,multi-level classification,desirable criteria and future directions in communication and networking systems[J].IEEE Communications Surveys & Tutorials,2021,23(2):1342-1397.

相关文章 15

[1]	张佳, 董守斌. 基于评论方面级用户偏好迁移的跨领域推荐算法 Cross-domain Recommendation Based on Review Aspect-level User Preference Transfer 计算机科学, 2022, 49(9): 41-47. https://doi.org/10.11896/jsjkx.220200131
[2]	鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[3]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4]	吕由, 吴文渊. 隐私保护线性回归方案与应用 Privacy-preserving Linear Regression Scheme and Its Application 计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190
[5]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[6]	戴朝霞, 李锦欣, 张向东, 徐旭, 梅林, 张亮. 基于DNGAN的磁共振图像超分辨率重建算法 Super-resolution Reconstruction of MRI Based on DNGAN 计算机科学, 2022, 49(7): 113-119. https://doi.org/10.11896/jsjkx.210600105
[7]	陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[8]	黄觉, 周春来. 基于本地化差分隐私的频率特征提取 Frequency Feature Extraction Based on Localized Differential Privacy 计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229
[9]	鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于DBSCAN聚类的集群联邦学习方法 Clustered Federated Learning Methods Based on DBSCAN Clustering 计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[10]	王建明, 陈响育, 杨自忠, 史晨阳, 张宇航, 钱正坤. 不同数据增强方法对模型识别精度的影响 Influence of Different Data Augmentation Methods on Model Recognition Accuracy 计算机科学, 2022, 49(6A): 418-423. https://doi.org/10.11896/jsjkx.210700210
[11]	闫萌, 林英, 聂志深, 曹一凡, 皮欢, 张兰. 一种提高联邦学习模型鲁棒性的训练方法 Training Method to Improve Robustness of Federated Learning 计算机科学, 2022, 49(6A): 496-501. https://doi.org/10.11896/jsjkx.210400298
[12]	王健. 基于隐私保护的反向传播神经网络学习算法 Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving 计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155
[13]	蔡欣雨, 冯翔, 虞慧群. 自适应权重的级联增强节点的宽度学习算法 Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes 计算机科学, 2022, 49(6): 134-141. https://doi.org/10.11896/jsjkx.210500119
[14]	尹文兵, 高戈, 曾邦, 王霄, 陈怡. 基于时频域生成对抗网络的语音增强算法 Speech Enhancement Based on Time-Frequency Domain GAN 计算机科学, 2022, 49(6): 187-192. https://doi.org/10.11896/jsjkx.210500114
[15]	徐辉, 康金梦, 张加万. 基于特征感知的数字壁画复原方法 Digital Mural Inpainting Method Based on Feature Perception 计算机科学, 2022, 49(6): 217-223. https://doi.org/10.11896/jsjkx.210500105

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

一种非独立同分布问题下的联邦数据增强算法

Federated Data Augmentation Algorithm for Non-independent and Identical Distributed Data

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0