针对机器学习的成员推断攻击综述

doi:10.11896/jsjkx.220100016

摘要/Abstract

摘要： 近年来,机器学习不仅在计算机视觉、自然语言处理等领域取得了显著成效,也被广泛应用于人脸图像、金融数据、医疗信息等敏感数据处理领域。最近,研究人员发现机器学习模型会记忆它们训练集中的数据,导致攻击者可以对模型实施成员推断攻击,即攻击者可以推断给定数据是否存在于某个特定机器学习模型的训练集。成员推断攻击的成功,可能导致严重的个人隐私泄露。例如,如果能确定某个人的医疗记录属于某医院的数据集,则表明这个人曾经是那家医院的病人。首先介绍了成员推断攻击的基本原理;然后系统地对近年来代表性攻击和防御的研究进行了总结和归类,特别针对不同条件设置下如何进行攻击和防御进行了详细的阐述;最后回顾成员推断攻击的发展历程,探究机器学习隐私保护面临的主要挑战和未来潜在的发展方向。

关键词: 机器学习, 成员推断, 隐私泄露, 隐私保护

Abstract: In recent years,machine learning has not only achieved remarkable results in conventional fields such as computer vision and natural language processing,but also been widely applied to process sensitive data such as face images,financial data and medical information.Recently,researchers find that machine learning models will remember the data in their training sets,making them vulnerable to membership inference attacks,that is,the attacker can infer whether the given data exists in the training set of a specific machine learning model.The success of membership inference attacks may lead to serious individual privacy leakage.For example,the existence of a patient's medical record in a hospital's analytical training set reveals that the patient was once a patient there.The paper first introduces the basic principle of membership inference attacks,and then systematically summarizes and classifies the representative research achievements on membership inference attacks and defenses in recent years.In particular,how to attack and defend under different conditions is described in detail.Finally,by reviewing the development of membership inference attacks,this paper explores the main challenges and potential development directions of machine learning privacy protection in the future.

Key words: Machine learning, Membership inference, Privacy leakage, Privacy protection

中图分类号:

TP181

彭钺峰, 赵波, 刘会, 安杨. 针对机器学习的成员推断攻击综述[J]. 计算机科学, 2023, 50(3): 351-359. https://doi.org/10.11896/jsjkx.220100016

PENG Yuefeng, ZHAO Bo, LIU Hui, AN Yang. Survey on Membership Inference Attacks Against Machine Learning[J]. Computer Science, 2023, 50(3): 351-359. https://doi.org/10.11896/jsjkx.220100016

参考文献

[1]WEYAND T,ARAUJO A,CAO B Y,et al.Google landmarks dataset v2-a large-scale benchmark for instance-level recognition and retrieval[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:2572-2581.
[2]HENAFF O.Data-efficient image recognition with contrastivepredictive coding[C]//2020 International Conference on Machine Learning(ICML).2020:4182-4192.
[3]BROWN T,MANN B,RYDER N,et al.Language models are few-shot learners[J].arXiv:2005.14165,2020.
[4]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-trainingof deep bidirectional transformers for language understanding[C]//2019 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.2019:4171-418.
[5]DOUMBOUYA M,EINSTEIN L,PIECH C.Using radioarchives for low-resource speech recognition:Towards an intelligent virtual assistant for illiterate users[C]//2021 AAAI Conference on Artificial Intelligence(AAAI ).2021:14757-14765.
[6]LIU S,GENG M,HU S,et al.Recent progress in the cuhk dysarthric speech recognition system[J].IEEE/ACM Transactions on Audio,Speech,and Language Processing,2021,29:2267-2281.
[7]TAIGMAN Y,YANG M,RANZATO M,et al.Deepface:Clo-sing the gap to human-level performance in face verification[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2014:1701-1708.
[8]SCHROFF F,KALENICHENKO D,PHILBIN J.Facenet:Aunified embedding for face recognition and clustering[C]//2015 IEEE Conference on Computer Vision and Patten Recognition(CVPR).2015:815-823.
[9]ERICKSON B J,KORFIATIS P,AKKUS Z,et al.Machinelearning for medical imaging[J].RadioGraphics,2017,37(2):505-515.
[10]KOUROU K,EXARCHOS T P,EXARCHOS K P,et al.Ma-chine learning applications in cancer prognosis and prediction[J].Computational and Structural Biotechnology Journal,2015,13:8-17.
[11]CARLINI N,LIU C,ERLINGSSON Ú,et al.The secret sharer:Evaluating and testing unintended memorization in neural networks[C]//28th USENIX Security Symposium(USENIX Security 19).2019:267-284.
[12]SONG C,RISTENPART T,SHMATIKOV V.Machine lear-ning models that remember too much[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security(CCS 17).2017:587-601.
[13]LEINO K,FREDRIKSON M.Stolen memories:Leveragingmodel memorization for calibrated white-box membership inference[C]//29th USENIX Security Symposium(USENIX Security 20).2020:1605-1622.
[14]TRAMÈR F,ZHANG F,JUELS A,et al.Stealing machinelearning models via prediction apis[C]//25th USENIX Security Symposium(USENIX Security 16).2016:601-618.
[15]OH S J,AUGUSTIN M,FRITZ M,et al.Towards reverse-engineering black-box neural networks[C]//2018 International Conference on Learning Representations(ICLR).2018:1-20.
[16]YU H,YANG K,ZHANG T,et al.Cloudleak:Large-scale deep learning models stealing through adversarial examples[C]//2020 Network and Distributed System Security Symposium(NDSS).2020:1-16.
[17]FREDRIKSON M,JHA S,RISTENPART T.Model inversionattacks that exploit confidence information and basic countermeasures[C]//2015 ACMSIGSAC Conference on Computer and Communications Security(CCS 15).2015:1322-1333.
[18]ZHANG Y,JIA R,PEI H,et al.The secret revealer:Generative model-inversion attacks against deep neural networks[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR).2020:250-258.
[19]MEHNAZ S,LI N,BERTINO E.Black-box model inversion attribute inference attacks on classification models[J].arXiv:2012.03404,2020.
[20]SHOKRI R,STRONATI M,SONG C,et al.MembershipInfe-rence Attacks against Machine Learning Models[C]//2017 IEEE Symposium on Security and Privacy(SP).2017:3-18.
[21]SALEM A,ZHANG Y,HUMBERT M,et al.Ml-leaks:Model and data independent membership inference attacks and defenses on machine learning models[C]//2019 Network and Distributed Systems Security(NDSS) Symposium.2019:1-15.
[22]YEOM S,GIACOMELLI I,FREDRIKSON M,et al.Privacyrisk in machine learning:Analyzing the connection to overfitting[C]//2018 IEEE 31st Computer Security Foundations Sympo-sium(CSF).2018:268-282.
[23]GERUM R C,ERPENBECK A,KRAUSS P,et al.Sparsitythrough evolutionary pruning prevents neuronal networks from overfitting[J].Neural Networks,2020,128:305-312.
[24]SONG X,JIANG Y,TU S,et al.Observational overfitting in reinforcement learning[C]//2020 International Conference on Learning Representations(ICLR).2020:1-29.
[25]RICE L,WONG E,KOLTER Z.Overfitting in adversarially robust deep learning[C]//The 37th International Conference on Machine Learning(ICML).2020:8093-8104.
[26]CHOQUETTE-CHOOCA C A,TRAMER F,CARLINI N,et al.Label-only membership inference attacks[C]//The 38th International Conference on Machine Learning(ICML).2021:1964-1974.
[27]KRIZHEVSKY A,HINTON G,et al.Learning multiple layers offeatures from tiny images[R].Technical report,University of Toronto,2009.
[28]NASR M,SHOKRI R,HOUMANSADR A.Comprehensive privacy analysis of deep learning:Passive and active white-box inference attacks against centralized and federated learning[C]//2019 IEEE Symposium on Security and Privacy(SP).2019:739-753.
[29]HUI B,YANG Y,YUAN H,et al.Practical blind membership inference attack via differential comparisons[C]//Network and Distributed Systems Security(NDSS) Symposium.2019:1-17.
[30]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[J].Journal of Privacy and Confidentiality,2017,7(3):17-51.
[31]SABLAYROLLES A,DOUZE M,SCHMID C,et al.White-box vs black-box:Bayes optimal strategies for membership inference[C]//The 36th International Conference on Machine Learning(ICML).2019:5558-5567.
[32]ABADI M,CHU A,GOOD-FELLOW I,et al.Deep learningwith differential privacy[C]//2016 ACMSIGSAC Conference on Computer and Communications Security(CCS 16).2016:308-318.
[33]NASR M,SHOKRI R,HOUMANSADR A.Machine learningwith membership privacy using adversarial regularization[C]//2018 ACM SIGSAC Conference on Computer and Communications Security(CCS 18).2018:634-646.
[34]SONG L W,MITTAL P.Systematic evaluation of privacy risks of machine learning models[C]//30^th USENIX Security Symposium(USENIX Security 21).2021:2615-2632.
[35]JIA J,SALEM A,BACKES M,et al.Memguard:Defendingagainst black-box membership inference attacks via adversarial examples[C]//2019 ACM SIGSAC Conference on Computer and Communications Security(CCS 19).2019:259-274.
[36]PAPERNOT N,MCDANIEL P,GOODFELLOW I,et al.Practical black-box attacks against machine learning[C]//2017 ACM on Asia Conference on Computer and Communications Security(AsiaCCS 17).2017:506-519.
[37]CARLINI N,WAGNER D.Towards evaluating the robustness of neural networks[C]//2017 IEEE Symposium on Security and Privacy(SP).2017:39-57.
[38]TRAMèR F,KURAKIN A,PAPERNOT N,et al.Ensemble adversarial training:Attacks and defenses[C]//2018 International Conference on Learning Representations(ICLR).2018:1-20.
[39]HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[J].arXiv:1503.02531,2015.
[40]DU S,YOU S,LI X,et al.Agree to disagree:Adaptive ensemble knowledge distillation in gradient space[C]//2020 Advances in Neural Information Processing Systems(NeurIPS 20).2020:12345-12355.
[41]SHEJWALKAR V,HOUMANSADR A.Membership privacyfor machine learning models through knowledge transfer[C]//2021 AAAI Conference on Artificial Intelligence.2021:9549-9557.
[42]TRUEX S,LIU L,GURSOY M,et al.Demystifying membership inference attacks in machine learning as a service[J].IEEE Transactions on Services Computing,2021,14(6):2073-2089.
[43]GOODFELLOW I,POUGET-ABADIE J,MIRZA M,et al.Ge-nerative adversarial nets[C]//2014 Advances in Neural Information Processing Systems(NeurIPS 14).2014:1-9.
[44]CHO J H,HARIHARAN B.On the efficacy of knowledge distillation[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV).2019:4793-4801.

相关文章 15

[1]	徐夏, 张晖, 杨春明, 李波, 赵旭剑. 公平谱聚类方法用于提高簇的公平性 Fair Method for Spectral Clustering to Improve Intra-cluster Fairness 计算机科学, 2023, 50(2): 158-165. https://doi.org/10.11896/jsjkx.211100279
[2]	王艺潭, 王一舒, 袁野. 学习索引研究综述 Survey of Learned Index 计算机科学, 2023, 50(1): 1-8. https://doi.org/10.11896/jsjkx.211000149
[3]	徐苗苗, 陈珍萍. 基于对称加密和双层真值发现的连续群智感知激励机制 Incentive Mechanism for Continuous Crowd Sensing Based Symmetric Encryption and Double Truth Discovery 计算机科学, 2023, 50(1): 294-301. https://doi.org/10.11896/jsjkx.220400101
[4]	陈得鹏, 刘肖, 崔杰, 何道敬. 面向机器学习的成员推理攻击综述 Survey of Membership Inference Attacks for Machine Learning 计算机科学, 2023, 50(1): 302-317. https://doi.org/10.11896/jsjkx.220800227
[5]	鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[6]	冷典典, 杜鹏, 陈建廷, 向阳. 面向自动化集装箱码头的AGV行驶时间估计 Automated Container Terminal Oriented Travel Time Estimation of AGV 计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[7]	宁晗阳, 马苗, 杨波, 刘士昌. 密码学智能化研究进展与分析 Research Progress and Analysis on Intelligent Cryptology 计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[8]	汤凌韬, 王迪, 张鲁飞, 刘盛云. 基于安全多方计算和差分隐私的联邦学习方案 Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy 计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[9]	吕由, 吴文渊. 隐私保护线性回归方案与应用 Privacy-preserving Linear Regression Scheme and Its Application 计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190
[10]	何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇. 基于大数据的进化网络影响力分析研究综述 Survey of Influence Analysis of Evolutionary Network Based on Big Data 计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[11]	李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩. 基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究 Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network 计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[12]	张光华, 高天娇, 陈振国, 于乃文. 基于N-Gram静态分析技术的恶意软件分类研究 Study on Malware Classification Based on N-Gram Static Analysis Technology 计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[13]	陈明鑫, 张钧波, 李天瑞. 联邦学习攻防研究综述 Survey on Attacks and Defenses in Federated Learning 计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[14]	李亚茹, 张宇来, 王佳晨. 面向超参数估计的贝叶斯优化方法综述 Survey on Bayesian Optimization Methods for Hyper-parameter Tuning 计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[15]	赵璐, 袁立明, 郝琨. 多示例学习算法综述 Review of Multi-instance Learning Algorithms 计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed