计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 232-237.doi: 10.11896/jsjkx.211100059

• 大数据&数据科学 • 上一篇    下一篇

基于DBSCAN聚类的集群联邦学习方法

鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩   

  1. 国防科技大学信息系统工程重点实验室 长沙 410073
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 马武彬(wb_ma@nudt.edu.cn)
  • 作者简介:(luchenyang97@163.com)
  • 基金资助:
    国家自然科学基金面上项目(61871388)

Clustered Federated Learning Methods Based on DBSCAN Clustering

LU Chen-yang, DENG Su, MA Wu-bin, WU Ya-hui, ZHOU Hao-hao   

  1. Science and Technology on Information Systems Engineering Laboratory,National University of Defense Technology,Changsha 410073,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:LU Chen-yang,born in 1997,postgra-duate.His main research interests include federated learning and machine lear-ning.
    MA Wu-bin,born in 1986,Ph.D,lectu-rer.His main research interests include data engineering and cyber-physical systems.
  • Supported by:
    General Program of National Natural Science Foundation of China(61871388).

摘要: 联邦学习(Federated Learning)是为了解决机器学习中以隐私保护为前提的数据碎片化和隔离问题。各客户端节点在本地训练数据,将训练的模型参数信息上传到中央服务器,由参数服务器聚合参数信息以达到共同训练的目的。由于现实环境中,各节点数据之间的分布往往不一致,通过分析非独立同分布数据对联邦学习准确率的影响,来证明传统联邦学习方法得到的模型精度较低。因此,采用多样化抽样策略模拟数据倾斜度分布,提出了基于DBSCAN(Density-Based Spatial Clustering of Applications with Noise)聚类的集群联邦学习算法(DBSCAN Based Cluster Federated Learning,DCFL),解决了联邦学习中不同节点的数据非独立同分布降低了学习准确率的问题。在Mnist和Cifar-10标准数据集上进行了实验,相比传统的联邦学习算法,基于DBSCAN聚类的集群联邦学习算法对模型的准确率有较大的提升。

关键词: 聚类, 客户端选择, 联邦学习, 数据分布, 训练优化

Abstract: Federated learning is to solve the problem of data fragmentation and isolation in machine learning based on privacy protection.Each client node trains the data locally and uploads the model parameter information to the central server,which aggregates the parameter information to achieve the purpose of common training.In the real environment,the distribution of data among nodes is often inconsistent.By analyzing the influence of independent identically distributed data on the accuracy of federated learning,it is proved that the accuracy of the model obtained by the traditional federated learning method is low.Therefore,a diversified sampling strategy is adopted to simulate the data inclination distribution,and a Clustered Federated Learning Methods algorithm based on DBSCAN clustering(DCFL) is proposed,which solves the problem that the learning accuracy is reduced when the data of different nodes are not independently and identically distributed in federated learning.Through the experimental comparison of Mnist and Cifar-10 standard data sets,compared with the traditional federated learning algorithm,DCFL can greatly improve the accuracy of the model.

Key words: Client selection, Cluster, Data distribution, Federated learning, Training optimization

中图分类号: 

  • TP301
[1] SHULTZ D.When your voice betrays you[J].Science,2015,347(6221):494-494.
[2] YANG Q,LIU Y,CHEN T,et al.Federated Machine Learning[J].ACM Transactions on Intelligent Systems and Technology,2019,10(2):1-19.
[3] BONAWITZ K.Practical Secure Aggregation for Privacy-Pre-serving Machine Learning[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017.
[4] AI and Data Privacy Protection:The Way to Federated Learning[J].Journal of Information Security Research,2019,5(11):961-965.
[5] SATTLER F,WIEDEMANN S,MULLER K R,et al.Robustand Communication-Efficient Federated Learning From Non-i.i.d.Data[J].IEEE Trans Neural Netw Learn Syst,2020,31(9):3400-3413.
[6] LI T,SAHU A K,TALWALKAR A,et al.Federated Learning:Challenges,Methods,and Future Directions[J].IEEE Signal Processing Magazine,2020,37(3):50-60.
[7] MCMAHAN H B,MOORE E,D RAMAGE,et al.Communication-Efficient Learning of Deep Networks from Decentralized Data[J].arXiv:1602.05629,2016.
[8] KAIROUZ P,BRENDAN H.McMahan.Advances and OpenProblems in Federated Learning[J].arXiv:1912.04977,2021.
[9] LI X,HUANG K,YANG W,et al.On the Convergence of Fed-Avg on Non-IID Data[J].arXiv:1907.02189,2019.
[10] ZHAO Y,LI M,LAI L,et al.Federated Learning with Non-IIDData[J].arXiv:1806.00582,2018.
[11] JIANG Y,KONEN J,RUSH K,et al.Improving FederatedLearning Personalization via Model Agnostic Meta Learning[J].arXiv:1909.12488,2019.
[12] MUHAMMAD K.FedFast:Going Beyond Average for FasterTraining of Federated Recommender Systems[C]//Proceedings of the 26th ACM SIGKDD International Conference on Know-ledge Discovery & Data Mining.2020.
[13] GHOSH A,HONG J,YIN D,et al.Robust Federated Learning in a Heterogeneous Environment[J].arXiv:1906.06629,2019.
[14] SATTLER F,MULLER K R,SAMEK W.Clustered Federated Learning:Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints[J].IEEE Transactions on Neural Network Learning and Systems,2021,32(8):3710-3722.
[15] ESTER M,KRIEGEL H P,SANDER J,et al.A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]//AAAI Press.1996.
[16] KONEN J,MCMAHAN H B,YU F X,et al.Federated Lear-ning:Strategies for Improving Communication Efficiency[J].ar-Xiv:1610.05492 2016.
[17] BONAWITZ K,EICHNER H,GRIESKAMP W,et al.Towards Federated Learning at Scale:System Design[J].arXiv:1902.01046,2019.
[18] MCMAHAN H B,RAMAGE D,TALWAR K,et al.LearningDifferentially Private Recurrent Language Models[J].arXiv:1710.06963,2017.
[1] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[2] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[3] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[4] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[5] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[6] 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞.
基于密度敏感距离和模糊划分的改进FCM算法
FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition
计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042
[7] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[8] 闫萌, 林英, 聂志深, 曹一凡, 皮欢, 张兰.
一种提高联邦学习模型鲁棒性的训练方法
Training Method to Improve Robustness of Federated Learning
计算机科学, 2022, 49(6A): 496-501. https://doi.org/10.11896/jsjkx.210400298
[9] 刘丽, 李仁发.
医疗CPS协作网络控制策略优化
Control Strategy Optimization of Medical CPS Cooperative Network
计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230
[10] 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳.
三维城市场景中的小物体检测
Small Object Detection in 3D Urban Scenes
计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[11] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[12] 朱哲清, 耿海军, 钱宇华.
面向化学结构的线段聚类算法
Line-Segment Clustering Algorithm for Chemical Structure
计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131
[13] 张宇姣, 黄锐, 张福泉, 隋栋, 张虎.
基于菌群优化的近邻传播聚类算法研究
Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization
计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218
[14] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[15] 杜辉, 李卓, 陈昕.
基于在线双边拍卖的分层联邦学习激励机制
Incentive Mechanism for Hierarchical Federated Learning Based on Online Double Auction
计算机科学, 2022, 49(3): 23-30. https://doi.org/10.11896/jsjkx.210800051
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!