计算机科学 ›› 2022, Vol. 49 ›› Issue (12): 66-73.doi: 10.11896/jsjkx.220600034

• 联邦学习* 上一篇    下一篇

基于联邦学习的Gamma回归算法

郭艳卿1, 李宇航1, 王湾湾2, 付海燕1, 吴铭侃1, 李祎1   

  1. 1 大连理工大学信息与通信工程学院 辽宁 大连116024
    2 深圳市洞见智慧科技有限公司研究中心 北京100028
  • 收稿日期:2022-06-06 修回日期:2022-08-29 出版日期:2022-12-15 发布日期:2022-12-14
  • 通讯作者: 付海燕(fuhy@dlut.edu.cn)
  • 作者简介:(guoyq@dlut.edu.cn)
  • 基金资助:
    国家自然科学基金(62076052,62106037,U1936117);中央高校基本科研业务费(DUT20TD110,DUT20RC(3)088);国家社科基金重大项目(19ZDA127);模式识别国家重点实验室开放课题项目(202100032)

FL-GRM:Gamma Regression Algorithm Based on Federated Learning

GUO Yan-qing1, LI Yu-hang1, WANG Wan-wan2, FU Hai-yan1, WU Ming-kan1, LI Yi1   

  1. 1 School of Information and Communication Engineering,Dalian University of Technology,Dalian,Liaoning 116024,China
    2 Research Center of InsightOne Tech Co.,Ltd.,Beijing 100028,China
  • Received:2022-06-06 Revised:2022-08-29 Online:2022-12-15 Published:2022-12-14
  • About author:GUO Yan-qing,born in 1980,Ph.D,professor, Ph.D supervisor.His main research interests include machine lear-ning,computer vision and cyberspace security.FU Hai-yan,born in 1981,Ph.D,senior engineer.Her main research interests include federated learning,image retrieval and computer vision.
  • Supported by:
    National Natural Science Foundation of China(62076052,62106037,U1936117),Fundamental Research Funds for the Central Universities(DUT20TD110,DUT20RC(3)088),Major Program of the National Social Science Foundation of China(19ZDA127) and Open Project Program of the National Laboratory of Pattern Recognition(NLPR)(202100032).

摘要: 在水文学、气象学以及保险理赔评估等领域中,通常假设因变量服从Gamma分布,相比多元线性回归,在Gamma分布假设下建立起的Gamma回归具有更出色的拟合效果。以往获得Gamma回归模型的方法是将数据集中起来进行训练,当数据是由多方提供时,在不交换数据的情况下训练满足隐私保护的Gamma回归模型成为需要解决的问题。 为此,提出了一种多方安全的纵向联邦Gamma回归算法,该算法首先使用迭代法推导出纵向联邦Gamma回归模型的对数似然估计表达式,然后结合工程实际确定模型的连接函数,进而构造损失函数建立参数的梯度更新策略,最后对同态加密后的各方参数进行融合更新,获得联邦学习后的Gamma回归模型。在两种公开数据集上进行性能测试,实验结果表明,所提联邦Gamma回归算法在不交换数据的前提下,可有效利用多方数据的价值生成Gamma回归模型,该模型对数据的拟合效果逼近数据在集中情况下学习到的Gamma回归模型,优于单方独立学习获得的Gamma回归模型。

关键词: 联邦学习, Gamma回归, 同态加密, 隐私保护, 多方安全计算

Abstract: People commonly hypothesize that an independent variable follows a Gamma distribution in many areas,including hydrology,meteorology and insurance claim.Under the Gamma distribution assumption,Gamma regression model enables an outstanding fitting effect,compared with multivariate linear-regression model.Previous studies may be able to obtain a Gamma regression model trained only on a public dataset.However,when the datasets are provided by multiple parties,how to seek to address the problem of data privacy by training Gamma regression model without exchanging the data itself? A secure multi-party federated Gamma regression algorithm has been applied to this area.Firstly,the log-likelihood function is derived with the iterative method.Secondly,the link function is determined according to the fact,and the gradient updating strategy is constructed by the loss function.Finally,the parameters with homomorphic encryption are updated,then the training is completed.The model is tested on two public datasets,and the results show that under the premise of privacy protection our method can effectively use the value of multi-party data to generate Gamma regression model.The fitting performance of our method is better than that of Gamma regression model implements in a single part,and is close to the result yielded by centralized data learning model.

Key words: Federated learning, Gamma regression, Homomorphic encryption, Privacy protection, Secure multi-party computation

中图分类号: 

  • TP181
[1]NELDE R,JOHN A,ROBERT W W.Generalized linear models[J].Journal of the Royal Statistical Society:Series A(General),1972,135(3):370-384.
[2]ENGLAND P D,RICHARD J V.Stochastic claims reserving in general insurance[J].British Actuarial Journal,2002,8(3):443-518.
[3]AMIN M,QASIM M,AMANUM,et al.Performance of someridge estimators for the gamma regression model[J].Statistical Papers,2020,61(3):997-1026.
[4]WU Z J.The Establishment of SPI_GD Drought index and the research on its test and application[D].Lanzhou:Lanzhou University,2017.
[5]MA X M,LUO Z Q.Human activity intensity time and space change research on haba snow mountain nature reserve[J].Journal of AnhuiAgricultural Sciences,2015,43(19):205-208.
[6]PAYNTER S,NACHABE M.Regional scale spatio-temporalconsistency of precipitation variables related to water resource management and planning[J].Meteorological Applications,2010,16(3):413-423.
[7]GONG M D.The empirical research on agricultural insuranceclassification ratemaking[D].Hunan:Hunan University,2011.
[8]ZHONG Z,MENG S W.Comparison and application of gamma regression and lognormla regression[J].Journal of Applied Statistics and Management,2010,29(3):430-436.
[9]LI T,SANJABI M,BEIRAMI A,et al.Fair resource allocation in federated learning[J].arXiv:1905.10497,2022.
[10]MCMAHAN B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[C]//Artificial Intelligence and Statistics.PMLR,2017:1273-1282.
[11]MCMAHAN H B,MOORE E,RAMAGE D,et al.Federated learning of deep networks using model averaging[J].arXiv:1602.05629,2016.
[12]GAO D,JU C,WEI X,et al.HHHFL:hierarchical heteroge-neous horizontal federated learning for electroence phalography[J].arXiv:1909.05784,2019.
[13]LIU Y,KANG Y,ZHANG X,et al.A communication efficient vertical federated learning framework[J].arXiv:1912.11187,2019.
[14]SHREYA S,XING C,YANG L,et al.Secure and efficient fede-rated transfer learning[C].In 2019 IEEE International Confe-rence on Big Data(Big Data).2019:2569-2576.
[15]HARD A,RAO K,MATHEWS R,et al.Federated learning for mobile keyboard prediction[J].arXiv:1811.03604,2018.
[16]LI T,ANIT K S,AMEET T,et al.Federated learning:Challenges,methods,and future directions[J].IEEE Signal Processing Magazine,2020,37(3):50-60.
[17]KAIROUZ P,MCMAHAN H B,AVENT B.Advances and open problems in federated learning[J].Foundations and Trends© in Machine Learning,2021,14(1/2):1-210.
[18]MOTHUKURI V,PARIZI R M,POURIYEH S,et al.A survey on security and privacy of federated learning[J].Future Generation Computer Systems,2021,115:619-640.
[19]WANG J Z,KONG L W,HUANG Z C,et al.Summary of fede-rated learning algorithms[J].Big Data,2020,6(6):64-82.
[20]YANG K,FAN T,CHEN T J,et al.A quasinewton methodbased vertical federated learning framework for logistic regression[J].arXiv:1912.00513,2019.
[21]YANG S,REN B,ZHOU X,et al.Parallel distributed logistic regression for vertical federated learning without thirdparty coordinator[J].arXiv:1911.09824,2019.
[22]LIU Y,LIU Y T,LIU Z,et al.Federated Forest[J].arXiv:1905.10053,2020.
[23]CHENG K,FAN T,JIN Y,et al.Secureboost:A lossless federated learning framework[J].IEEE Intelligent Systems,2021,36(6):87-98.
[24]YANG Q,LIU Y,CHEN T,et al.Federated machine learning:concept and applications[J].ACM Transactions on Intelligent Systems and Technology,2019,10(2):1-19.
[25]LI Q,WEN Z Y,WU Z M,et al.A survey on federated learning systems:vision,hype and reality for data privacy and protection[J].arXiv:1907.09693,2022.
[26]JAKUB K H,BRENDAN M,DANIEL R,et al.Federated optimization:distributed machine learning for on-device intelligence[J].arXiv:1610.02527,2022.
[27]WANG J Y,LIU Q H,LIANG H,et al.Tackling the objective inconsistency problem in heterogeneous federated optimization[C]//Advances in Neural Information Processing Systems.2020:7611-7623.
[28]LUO X,WU Y,XIAO X,et al.Feature inference attack on mo-del predictions in vertical federated learning[C]//International Conference on Data Engineering.2021:181-192.
[29]WANC P,CHEN Q.Robust federated learning with attack-adaptive aggregation[J].arXiv:2102.05257,2021.
[30]HAZRAT A,TANVIR A,MOWAFA H,et al.Federated Lear-ning and Internet of Medical Things-Opportunities and Challenges[J].Studies in Health Technology and Informatics,2022,295:201-204.
[31]WEN Y,CHEN M.Medical Data Sharing Scheme Combinedwith Federal Learning and Blockchain[J].Computer Enginee-ring,2022,48(5):145-153,161.
[32]WANG S S,CHEN J Y,LU Y N.COVID-19 chest CT image segmentation based on federated learning and blockchain[J].Journal of Jilin University,2021,51(6):2164-2173.
[33]PINKAS B,SCHNEIDER T,ZOHNER M.Scalable private set intersection based on OT extension[J].ACM Transactions on Privacy and Security(TOPS),2018,21(2):1-35.
[34]TANG Y Y.Gamma distribution and gamma regression[D].Yangzhou:Yangzhou University,2017.
[35]RIVEST R L,ADLEMAN L M,DERTOUZOS M L.On data banks and privacy homomorphisms[J].Foundations of secure computation,1978,4(11):169-180.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 吕由, 吴文渊.
隐私保护线性回归方案与应用
Privacy-preserving Linear Regression Scheme and Its Application
计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190
[4] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[5] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于DBSCAN聚类的集群联邦学习方法
Clustered Federated Learning Methods Based on DBSCAN Clustering
计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[6] 闫萌, 林英, 聂志深, 曹一凡, 皮欢, 张兰.
一种提高联邦学习模型鲁棒性的训练方法
Training Method to Improve Robustness of Federated Learning
计算机科学, 2022, 49(6A): 496-501. https://doi.org/10.11896/jsjkx.210400298
[7] 王健.
基于隐私保护的反向传播神经网络学习算法
Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving
计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155
[8] 李利, 何欣, 韩志杰.
群智感知的隐私保护研究综述
Review of Privacy-preserving Mechanisms in Crowdsensing
计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077
[9] 秦小月, 黄汝维, 杨波.
基于素数幂次阶分圆环的NTRU型全同态加密方案
NTRU Type Fully Homomorphic Encryption Scheme over Prime Power Cyclotomic Rings
计算机科学, 2022, 49(5): 341-346. https://doi.org/10.11896/jsjkx.210300089
[10] 王美珊, 姚兰, 高福祥, 徐军灿.
面向医疗集值数据的差分隐私保护技术研究
Study on Differential Privacy Protection for Medical Set-Valued Data
计算机科学, 2022, 49(4): 362-368. https://doi.org/10.11896/jsjkx.210300032
[11] 杜辉, 李卓, 陈昕.
基于在线双边拍卖的分层联邦学习激励机制
Incentive Mechanism for Hierarchical Federated Learning Based on Online Double Auction
计算机科学, 2022, 49(3): 23-30. https://doi.org/10.11896/jsjkx.210800051
[12] 王鑫, 周泽宝, 余芸, 陈禹旭, 任昊文, 蒋一波, 孙凌云.
一种面向电能量数据的联邦学习可靠性激励机制
Reliable Incentive Mechanism for Federated Learning of Electric Metering Data
计算机科学, 2022, 49(3): 31-38. https://doi.org/10.11896/jsjkx.210700195
[13] 赵罗成, 屈志昊, 谢在鹏.
面向多层无线边缘环境下的联邦学习通信优化的研究
Study on Communication Optimization of Federated Learning in Multi-layer Wireless Edge Environment
计算机科学, 2022, 49(3): 39-45. https://doi.org/10.11896/jsjkx.210800054
[14] 任花, 牛少彰, 王茂森, 岳桢, 任如勇.
基于奇异值分解的同态可交换脆弱零水印研究
Homomorphic and Commutative Fragile Zero-watermarking Based on SVD
计算机科学, 2022, 49(3): 70-76. https://doi.org/10.11896/jsjkx.210800015
[15] 吕由, 吴文渊.
基于同态加密的线性系统求解方案
Linear System Solving Scheme Based on Homomorphic Encryption
计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!