计算机科学 ›› 2025, Vol. 52 ›› Issue (3): 188-196.doi: 10.11896/jsjkx.240100213

• 数据库&大数据&数据科学 • 上一篇    下一篇

FedRCD:一种基于分布提取与社区检测的聚类联邦学习算法

王瑞聪, 边耐政, 吴英俊   

  1. 湖南大学信息科学与工程学院 长沙 410082
  • 收稿日期:2024-01-30 修回日期:2024-07-02 出版日期:2025-03-15 发布日期:2025-03-07
  • 通讯作者: 边耐政(741565912@qq.com)
  • 作者简介:(ruicongwong@hnu.edu.cn)
  • 基金资助:
    湖南省交通运输厅2021年度科技进步与创新计划项目(202101-E-34)

FedRCD:A Clustering Federated Learning Algorithm Based on Distribution Extraction andCommunity Detection

WANG Ruicong, BIAN Naizheng, WU Yingjun   

  1. College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China
  • Received:2024-01-30 Revised:2024-07-02 Online:2025-03-15 Published:2025-03-07
  • About author:WANG Ruicong,born in 1998,master.His main research interests include fe-derated learning and edge computing.
    BIAN Naizheng,born in 1969,master,associate professor.His main research interests include blockchain,federated learning,software engineering,and big data.
  • Supported by:
    Science and Technology Progress and Innovation Plan of the Department of Transportation of Hunan Province in 2021(202101-E-34).

摘要: 将客户端聚类并在簇内进行联邦学习是缓解传统联邦学习算法在非独立同分布(Non-IID)数据场景下表现不佳的一类有效方法。这类方法大多使用客户端本地模型的参数来表征数据特性,并利用参数间的“距离”来评估相似性,从而实现客户端的聚类,但由于神经网络神经元的置换不变性,聚类效果可能会不准确。此外,这类方法通常需要预设聚类数量,可能产生不合理的聚类,或者需要在算法迭代过程中进行聚类,这将带来过大的通信开销。在深入分析了现有方法的缺点之后,提出了一种新颖的联邦学习算法FedRCD。该算法结合了自编码器和K-Means算法,直接从客户端提取数据集的分布信息来描述其特性,从而降低了对模型参数的依赖;FedRCD还将客户端关系组织成图结构,并通过Louvain算法完成客户端聚类关系的构建,这个过程无需预设聚类数量,因此聚类结果更加合理。实验结果表明,FedRCD能更有效地挖掘客户端间的潜在聚类关系,在多种非独立同分布数据场景下,与其他联邦学习算法相比,显著提升了神经网络的训练效果。在CIFAR10数据集上,FedRCD的准确率比经典的FedAvg算法提高了37.08%,比最新发布的FeSEM算法也提高了1.89%,同时展现出更优秀的公平性表现。

关键词: 联邦学习, 非独立同分布, 分布提取, 社区检测, Louvain算法

Abstract: Clustering clients and conducting federated learning within clusters is an effective method to mitigate the poor perfor-mance of traditional federated learning algorithms in non-independently and identically distributed(Non-IID) data scenarios.Such methods primarily utilize the parameters of a client’s local model to characterize data features,and evaluate similarity through the “distance” between parameters,thereby realizing client clustering.However,due to the permutation invariance of neurons in neural networks,this could lead to inaccurate clustering results.Moreover,these methods typically require a predetermined number of clusters,which might result in unreasonable clusters,or they may require clustering during the algorithmic iterative process,lea-ding to substantial communication overhead.After in-depth analysis of the shortcomings of existing methods,a novel federated learning algorithm named FedRCD is proposed.This algorithm combines autoencoders and K-Means algorithms,directly extracting distribution information from a client’s dataset to represent its characteristics,thereby reducing reliance on model parameters.FedRCD also organizes the relationships between clients into a graph structure,and employs the Louvain algorithm to construct client clustering relationships.This process does not require pre-setting the number of clusters,which makes the clustering results more reasonable.Experimental results show that FedRCD can more effectively unearth latent clustering relationships between clients.In a variety of Non-IID data scenarios,compared to other federated learning algorithms,it significantly improves the training effect of neural networks.On the CIFAR10 dataset,the accuracy of FedRCD surpasses the classical FedAvg algorithm by 37.08%,and even outperforms the newly released FeSEM algorithm by 1.89%,demonstrating superior fairness performance.

Key words: Federated learning, Non-IID, Distribution extraction, Community detection, Louvain algorithm

中图分类号: 

  • TP393.4
[1]MCMAHAN B,MOORE E,RAMAGE D,et al.Communica-tion-efficient learning of deep networks from decentralized data[C]//PMLR.2017:1273-1282.
[2]ZHU H,XU J,LIU S,et al.Federated learning on non-IID data:A survey[J].Neurocomputing,2021,465:371-390.
[3]GUO G J,TIAN H,PI H J,et al.Advances in Federated Lear-ning for Non-independent Identically Distributed Data[J].Journal of Chinese Computer Systems,2023,44(11):2442-2449
[4]ZHAO Y,LI M,LAI L,et al.Federated learning with non-iiddata[J].arXiv:1806.00582,2018.
[5]LI T,SAHU A K,ZAHEER M,et al.Federated optimization in heterogeneous networks[C]//Proceedings of Machine Learning and Systems.2020:429-450.
[6]KARIMIREDDY S P,KALE S,MOHRI M,et al.Scaffold:Stochastic controlled averaging for federated learning[C]//International Conference on Machine Learning.PMLR,2020:5132-5143.
[7]ARIVAZHAGAN M G,AGGARWAL V,SINGH A K,et al.Federated learning with personalization layers[J].arXiv:1912.00818,2019.
[8]ZOU M H,GAN Z X.Federated Learning Algorithm for Non-IID Data with Partial Device Participation[J].Journal of Chinese Computer Systems.2023,44(6):1121-1127.
[9]HUANG Y,CHU L,ZHOU Z,et al.Personalized cross-silo fe-derated learning on non-iid data[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:7865-7873.
[10]LONG G,XIE M,SHEN T,et al.Multi-Center FederatedLearning:Clients Clustering for Better Personalization[J].ar-Xiv:2005.01026,2023.
[11]GHOSH A,CHUNG J,YIN D,et al.An efficient frameworkfor clustered federated learning[J].Advances in Neural Information Processing Systems,2020,33:19586-19597.
[12]SATTLER F,MÜLLER K R,SAMEK W.Clustered federated learning:Model-agnostic distributed multitask optimization under privacy constraints[J].IEEE transactions on neural networks and learning systems,2020,32(8):3710-3722.
[13]DENNIS D K,LI T,SMITH V.Heterogeneity for the win:One-shot federated clustering[C]//International Conference on Machine Learning.PMLR,2021:2611-2620.
[14]LIU B,GUO Y,CHEN X.PFA:Privacy-preserving federated adaptation for effective model personalization[C]//Proceedings of the Web Conference 2021.2021:923-934.
[15]YANG Z Q,ZHANG Y G,ZHENG Y,et al.FedFed:Feature distillation against data heterogeneity in federated learning[J].arXiv:2310.05077,2024.
[16]LI Q,ZHANG L Y,MENG X Y.A resource-efficient clustering collaborative federated client selection[J/OL].https://doi.org/10.13229/j.cnki.jdxbgxb.20231369.
[17]LI R J,YAN Q.Inter-cluster Optimization for Cluster Federated Learning[J].Computer Science,2023,50(S2):543-547.
[18]HINTON G E,SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science,2006,313(5786):504-507.
[19]ZHOU Z H.Machine Learing[M].Beijing:TsingHua University Press,2016:225-242.
[20]BLONDEL V D,GUILLAUME J L,LAMBIOTTE R,et al.Fast unfolding of communities in large networks[J].Journal of Statistical Mechanics:Theory and Experiment,2008,2008(10):P10008.
[21]LIU T Y.Distributed Machine Learning theories,algorithms,and systems[M].Beijing:China Machine Press,2018:44-46.
[22]MCINNES L,HEALY J,MELVILLE J.Umap:Uniform manifold approximation and projection for dimension reduction[J].arXiv:1802.03426,2018.
[23]LECUN Y,BOSER B,DENKER J S,et al.Backpropagation applied to handwritten zip code recognition[J].Neural computation,1989,1(4):541-551.
[24]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536.
[25]LI T,SANJABI M,BEIRAMI A,et al.Fair resource allocation in federated learning[J].arXiv:1905.10497,2019.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!