计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 32-40.doi: 10.11896/jsjkx.201000093

• 数据库&大数据&数据科学* • 上一篇    下一篇

基于层析分析改进的联邦平均算法

罗长银1,2,3, 陈学斌1,2,3, 马春地1, 张淑芬1,2,3   

  1. 1 华北理工大学理学院 河北 唐山 063210
    2 华北理工大学河北省数据科学与应用重点实验室 河北 唐山063210
    3 华北理工大学唐山市数据科学重点实验室 河北 唐山063210
  • 收稿日期:2020-08-14 修回日期:2021-01-03 发布日期:2021-08-10
  • 通讯作者: 陈学斌(chxb@qq.com)
  • 基金资助:
    国家自然科学基金项目(61572170,61170254);唐山市科技项目(18120203A)

Improved Federated Average Algorithm Based on Tomographic Analysis

LUO Chang-yin1,2,3, CHEN Xue-bin1,2,3, MA Chun-di1, ZHANG Shu-fen1,2,3   

  1. 1 College of Science,North China University of Science and Technology,Tangshan,Hebei 063210,China;
    2 Hebei Province Key Laboratory of Data Science and Application,North China University of Science and Technology,Tangshan,Hebei 063210,China;
    3 Tangshan Data Science Key Laboratory,North China University of Science and Technology,Tangshan,Hebei 063210,China
  • Received:2020-08-14 Revised:2021-01-03 Published:2021-08-10
  • About author:LUO Chang-yin,born in 1994,master,is a member of China Computer Federation.His main research interest include data security and so on.(1394301218@qq.com)CHEN Xue-bin,born in 1970,professor,Ph.D,is a distinguished member of China Computer Federation.His main research interest include data security,Internet of things security and network security.
  • Supported by:
    National Natural Science Foundation of China(61572170,61170254) and Tangshan Science and Technology Project(18120203A).

摘要: 联邦平均(Fedavg)算法采用权重更新来更新全局模型,该算法在权重更新时仅考虑每个客户端数据量的大小,未考虑数据质量对模型的影响。针对该问题,文中提出了基于层次分析改进的联邦平均算法,首次从数据质量的角度来处理多源数据。首先采用熵权法计算数据中各属性的重要度,并将其作为层次分析中准则层的数值,计算每个客户端数据的质量,然后结合客户端数据量的大小,重新计算全局模型中的权重。仿真实验的结果表明,对于中小型数据集而言,使用支持向量机训练的模型准确度最高,达到了85.715 2%;对于大型数据集而言,采用随机森林训练的模型准确率最高,达到了91.932 1%。与传统联邦平均方法相比,所提方法在中小数据集上准确率提升了3.5%,在大数据集上提升了1.3%,能够在提升模型准确率的同时提高数据与模型的安全性。

关键词: 联邦平均(Fedavg), 熵权法, 层析分析, 权重更新

Abstract: In the federated average algorithm,the weight update is used to update the global model.The algorithm only considers the size of the data volume of each client when the weight is updated,and does not consider the impact of data quality on the mo-del.An improvement based on analytic hierarchy is proposed.The federated averaging algorithm is the first to process multi-source data from the perspective of data quality.First,the entropy method is used to calculate the importance of each attribute in the data,and it is used as the value of the criterion layer in the level analysis to calculate the data of each client quality.Then,combined with the amount of data on the client,the weight update method is recalculated in the global model.The simulation results show that for small and medium data sets,the model trained with support vector machines has the highest accuracy,rea-ching 85.7152%.For large data sets,the model trained with random forest has the highest accuracy,reaching 91.9321%.Compared with the traditional federal average method,the accuracy rate is increased by 3.5% on small and medium data sets and 1.3% on large data sets,which can improve the accuracy of the model while improving the security of the data and model.

Key words: Federated average(Fedavg), Entropy weight method, Tomographic analysis, Weight update

中图分类号: 

  • TP391
[1]MCMAHAN H B,MOORE E,RAMAG E,et al.Communi-cation-efficient learning of deep networks from decentralized data[C]//Proceedings of the 20th International Conference on Artificial Intelligence and Statistics.Fort Lauderdale:JMLR,2017:1-11.
[2]KONECNY J,MCMAHAN H B,YU F X,et al.Federatedlearning:Strategies for improving communicaton efficiency[C]//Proceedings of the 30th Annual Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2016:1-10.
[3]YANG Q.Challengs of GDPR to AI and countermeasures based on federal transfer learning[J].Communications of Chinese Association of Artificial Intelligence,2018,8:1-8.
[4]YANG Q,LIU Y,CHEN T,et al.Federated Machine Learning:Concept and Applications[J].ACM Transactions on Intelligent Systems and Technology,2019,10(2):1-19.
[5]WANG S,TUOR T,SALONIDIS T,et al.Adaptive federated learning in resource constrained edge computing systems[J].IEEE Journal on Selected Areas in Communications,2019,37(6):1205-1221.
[6]LIU Y,LIU Y T,LIU Z J,et al.Federated Forest[EB/OL].[2019-05-24].https://arxiv.org/pdf/1905.10053v1.pdf.
[7]SHARMA S,CHEN K.Poster:Privacy-Preserving Boostingwith Random Linear Classifiers[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2018:1-13.
[8]SUN C,SHRIVASTAVA A,SINGH S,et al.Revisiting Unreasonable Effectiveness of Data in Deep Learning Era[EB/OL].[2017-08-04].https://arxiv.org/abs/1707.02968v2.pdf.
[9]KIM H,PARK J,BENNIS M,et al.On-Device Federated Lear-ning via Blockchain and its Latency Analysis[EB/OL].[2018-07-01].https://arxiv.org/pdf/1808.03949.pdf.
[10]LI S Y,CHENG Y,LIU Y,et al.Abnormal Client Behavior Detection in Federated Learning[EB/OL].[2019-12-06].https://arxiv.org /pdf/1910.09933.pdf.
[11]ZHU L G,LIU Z J,HAN S,et al.Deep Leakage from Gradients[EB/OL].[2019-12-19].https://arxiv.org/pdf/1906.08935.
[12]CHEN Y S,YANG Y H,YANG Y H,et al.Nuclear Power Plant Fault Diagnosis Technology Based on Deep Learning Neural Network [J].Journal of Shanghai Jiaotong University,2018,52(S1):58-61.
[13]GAO H H,HUANG W Q,YANG X X.Applying Probabilistic Model Checking to Path Planning in an Intelligent Transportation System Using Mobility Trajectories and Their Statistical Data[J].Intelligent Automation and Soft Computing(Autosoft),2019,25(3):547-559.
[14]GAO H H,HUANG W Q,DUAN Y C,et al.Research on Cost-Driven Services Composition in an Uncertain Environment[J].Journal of Internet Technology (JIT).2019,20(3):755-769.
[15]PREUVENEERS D,RIMMER V,TSINGENOPOULOS I,et al.Chained Anomaly Detection Models for Federated Lear-ning:An Intrusion Detection Case Study[J].Applied Sciences,2018,8(12):2663.
[16]BRISIMI T S,CHEN R,MELA T,et al.Federated learning of predictive models from federated Electronic Health Records[J].International Journal of Medical Informatics,2018,112:59-67.
[17]ZHANG W S,ZHANG Y J,ZHAI J,et al.Multi-source data fu-sion using deep learning for smart refrigerators[J].Computers in Industry,2018,95:15-21.
[18]LEE J,WANG F,SUN J M,et al.Privacy-Preserving Patient Similarity Learning in a Federated Environment:Development and Analysis[J].JMIR Medical Informatics,2018,6(2):e20.
[19]SHEN G J,HAN X,ZHOU J J,et al.Research on IntelligentAnalysis and Depth Fusion of Multi-Source Traffic Data[J].IEEE Access,2018(6):59329-59335.
[20]LIU J,LI T R,XIE P,et al.Urban big data fusion based on deep learning:An overview[J].Information Fusion,2020(53):123-133.
[21]WANG X D,XIONG Y L.Evaluation of the effect of rurale-commerce support policies-a text analysis based on the entropy method[J].Technology Management Research,2020,40(14):218-226.
[22]CHEN M Q.Heilongjiang Province Producer Service Industry Development Heterogeneity Analysis-Based on the Application of Entropy Method[J].Business Economy,2020(7):9-10,44.
[23]LIU B J,ZHANG S,LUO X Y,et al.Application of Aircraft Battle Damage Evaluation Based on Extension Entropy Method[J].Ordnance Industry Automation,2020,39(7):42-44.
[24]ZENG Y J.Application of improved entropy weight model in risk assessment of flood discharge and storage area[J].Yangtze River Technology and Economy,2020,4(S1):22-23.
[25]LI X,LI J C,ZHENG X F,et al.A Quantitative EvaluationMethod of Information System Vulnerabilities Based on Analytic Hierarchy Process[J].Computer Science,2012,39(7):58-63.
[26]WANG L,YE J,ZHANG H L.A method of supplier selection based on rough set and analytic hierarchy process[J].Computer Science,2014,41(3):80-84.
[1] 史朝卫, 孟相如, 马志强, 韩晓阳. 拓扑综合评估与权值自适应的虚拟网络映射算法[J]. 计算机科学, 2020, 47(7): 236-242.
[2] 窦志武,李红巍,熊琦. 基于熵权法和神经网络的口岸物流综合能力评价方法研究[J]. 计算机科学, 2015, 42(Z11): 554-556.
[3] 叶俊,张正军. 基于DS-Adaboost算法的人脸检测[J]. 计算机科学, 2013, 40(Z11): 318-319.
[4] . 集合多覆盖问题的乘性权重更新分析[J]. 计算机科学, 2007, 34(10): 219-220.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[2] 杜秀丽,顾斌斌,胡兴,邱少明,陈波. 用于图像重构的基于行间支撑集相似度的CoSaMP算法[J]. 计算机科学, 2018, 45(4): 306 -311 .
[3] 杨沛安, 武杨, 苏莉娅, 刘宝旭. 网络空间威胁情报共享技术综述[J]. 计算机科学, 2018, 45(6): 9 -18 .
[4] 耿焕同,丁洋洋,周利发,韩伟民. 一种基于自适应选择策略的改进型MOEA/D算法[J]. 计算机科学, 2018, 45(5): 201 -207 .
[5] 李月,王芳. 基于NVM的存储安全综述[J]. 计算机科学, 2018, 45(7): 53 -60 .
[6] 秦梦娜, 陈俊杰, 郭浩. 基于高阶最小生成树脑网络的多特征融合分类方法[J]. 计算机科学, 2018, 45(7): 293 -298 .
[7] 刘春阳, 吴泽民, 胡磊, 刘熹. 基于似物性和空时协方差特征的行人检测算法[J]. 计算机科学, 2018, 45(6A): 210 -214 .
[8] 夏扬波, 杨文忠, 张振宇, 王庆鹏, 石研. 一种移动无线传感器网络的节点位置预测方法[J]. 计算机科学, 2018, 45(8): 113 -118 .
[9] 郭小英, 李亮, 耿海军. 合成纹理图像的视觉相似性眼动分析[J]. 计算机科学, 2018, 45(8): 223 -228 .
[10] 甘玲, 赵福超, 杨梦. 一种自适应组稀疏表示的图像修复方法[J]. 计算机科学, 2018, 45(8): 272 -276 .