计算机科学 ›› 2022, Vol. 49 ›› Issue (4): 362-368.doi: 10.11896/jsjkx.210300032

• 信息安全 • 上一篇    下一篇

面向医疗集值数据的差分隐私保护技术研究

王美珊, 姚兰, 高福祥, 徐军灿   

  1. 东北大学计算机科学与工程学院 沈阳 110169
  • 收稿日期:2021-03-02 修回日期:2021-08-07 发布日期:2022-04-01
  • 通讯作者: 高福祥(gaofuxiang@mail.neu.edu.cn)
  • 作者简介:(641234923@qq.com)

Study on Differential Privacy Protection for Medical Set-Valued Data

WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can   

  1. School of Computer Science and Engineering, Northeastern University, Shenyang 110169, China
  • Received:2021-03-02 Revised:2021-08-07 Published:2022-04-01
  • About author:WANG Mei-shan,born in 1996,postgraduate.Her main research interests include privacy protection and so on.GAO Fu-xiang,born in 1961,Ph.D,professor.His main research interests include computer network security,embedded computer networks.

摘要: 信息技术和医疗健康信息化的不断发展使医疗数据大规模涌现,为数据分析、数据挖掘、智能诊断等更深层次的应用提供了条件。医疗数据集庞大且涉及大量病人隐私,如何在使用医疗数据的同时保护病人隐私极具挑战性。目前应用于医疗领域的隐私保护技术主要以匿名化技术为主,但当攻击者具有强大的背景知识时,此类方法无法兼顾数据集的隐私性和可用性。因此提出了一种优化分类树算法,并改进了Diffpart分区算法,以数据间关联性为前提,挑选出医疗集值数据集中的适当数据,利用差分隐私保护技术进行加噪处理,满足差分隐私干扰并支持统计查询。最后在24万余条真实医疗数据集上进行测试。实验结果表明,所提算法满足差分隐私分布,并且相比Diffpart算法具备更高的隐私性和效用。

关键词: 差分隐私, 集值数据, 数据可用性, 医疗大数据, 隐私保护

Abstract: Electronic medical data surges along with the constant development of information technologies and medical care digitalization.It provides foundations for further application on data analysis, data mining and intelligent diagnosis.The fact that me-dical data are massive and involve a lot of patient privacy.How to protect patient privacy while using medical data is challenging.The predominant principle for the solutions is anonymity.It is not competent in confidentiality or availability when attackers possess strong background knowledge.This paper proposes an optimized classification tree and an improved Diffpart.In our design, association of data is introduced to sift set-valued data for DP based perturbation, which satisfies the utility and supports statistic query.Then test is conducted with 240000 practical medical data and the results show that the proposed algorithm holds DP distribution and outperforms Diffpart in privacy and utility.

Key words: Data utility, Differential privacy, Medical big data, Privacy protection, Set-Valued data

中图分类号: 

  • TP309.2
[1] SWEENEY L.k-anonymity:A model for protecting privacy[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2002,10(5):557-570.
[2] SAMARATI P.Protecting respondents’ identities in microdata release[J].IEEE Transactions on Knowledge and Data Engineering,2001,13(6):1010-1027.
[3] NARAYANAN A,SHMATIKOV V.Robust de-anonymization of large sparse datasrts[C]//Proceedings of the 2008 IEEE Symposium on Security and Privacy.Oakland,USA,2008:111-125.
[4] XIONG P,ZHU T Q,WANG X F.A Survey on Differential Privacy and Applications[J].Chinese Journal of Computers,2014,37(1):101-122.
[5] DWORK C.Differential privacy:A survey of results[C]//Proceedings of the 5th International Conference on Theory and Applications of Models of Computation.Xi’an,China,2008:1-19.
[6] XIAO X,WANG G,GEHREKE J.Differential privacy viawavelet transforms[C]//Proceedings of the IEEE 26th International Conference on Data Engineering.Piscataway,NJ:IEEE,2010:225-236.
[7] HAY M,LI C,MIKLAU G,et al.Accurate estimation of the degree distribution of private networks[C]//Proceedings of the 9th IEEE International Conference on Data Mining.Piscataway,NJ:IEEE,2009:169-178.
[8] MCSHERRY F,MIRONOV I.Differentially private recom-mender systems;building privacy into the net[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2009:627-636.
[9] CHEN R,MOHAMMED N,FUNG B C M,et al.Publishing set-valued data via differential privacy[J].Proceedings of the VLDB Endowment,2011,4(11):1087-1098.
[10] DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Proceedings of the 3rd Conference on Theory of Cryptography.New York,USA,2006:265-284.
[11] MCSHERRY F,TALWAR K.Mechanism design via differential privacy[C]//Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science.Providence,Rhode Island,USA,2007:94-103.
[12] ABADI M,GOODFELLOW I.Deep learning with differentialprivacy[C]//ACM Sigsac Conference on Computer and Communications Security.ACM,2016:308-318.
[13] CAI T T,WANG Y,ZHANG L.The cost of privacy:optimal rates of convergence for parameter estimation with differential privacy[J].arXiv:1902.04495,2019.
[14] BEAULIEU-JONES B K,WU Z S,WILLIAMS C,et al.Privacy-preserving generative deep neural networks support clinical data sharing[J].BioRxiv,2017,159756.
[15] BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ framework[C]//Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.2005:128-138.
[16] DWORK C,NAOR M,PITASSI T,et al.Pan-private streaming algorithms[C]//Proceedings of the 1st Symposium on Innovations in Computer Science.2010.
[17] LI Y,HAO Z F,WEN W,et al.Research on differential privacy preserving K-means clustering[J].Computer Science,2013,40(3):287-290.
[18] SONG F G,MA T H,TIAN Y,et al.A new method of privacy protection:random k-anonymous[J].IEEE Access,2019,7:75434-75445.
[19] SHI X J,HU Y L.Proprietary protection of dynamic set-valued data release based on classification tree[J].Computer Science,2017,44(5):120-124,165.
[20] LI S Y,JI X S,YOU W,et al.A data query hierarchical control strategy based on differential privacy[J].Computer Science,2019,46(11):130-136.
[21] DONG X M,WANG R,ZOU X K.Survey on Privacy Protection Solutions for Recommended Applications[J].Computer Science,2021,48(9):21-35.
[22] CHEN H Y,WANG J H,HU Z P,et al.Dynamic update privacy protection algorithm for medical data publishing[J].Compu-ter Science,2019,46(1):206-211.
[23] MCSHERRY F.Privacy integrated queries:An ex- tensible platform for privacy-preserving data analysis[J].Communications of the ACM,2010,53(9):89-97.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 汤凌韬, 王迪, 张鲁飞, 刘盛云.
基于安全多方计算和差分隐私的联邦学习方案
Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy
计算机科学, 2022, 49(9): 297-305. https://doi.org/10.11896/jsjkx.210800108
[3] 吕由, 吴文渊.
隐私保护线性回归方案与应用
Privacy-preserving Linear Regression Scheme and Its Application
计算机科学, 2022, 49(9): 318-325. https://doi.org/10.11896/jsjkx.220300190
[4] 黄觉, 周春来.
基于本地化差分隐私的频率特征提取
Frequency Feature Extraction Based on Localized Differential Privacy
计算机科学, 2022, 49(7): 350-356. https://doi.org/10.11896/jsjkx.210900229
[5] 王健.
基于隐私保护的反向传播神经网络学习算法
Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving
计算机科学, 2022, 49(6A): 575-580. https://doi.org/10.11896/jsjkx.211100155
[6] 李利, 何欣, 韩志杰.
群智感知的隐私保护研究综述
Review of Privacy-preserving Mechanisms in Crowdsensing
计算机科学, 2022, 49(5): 303-310. https://doi.org/10.11896/jsjkx.210400077
[7] 吕由, 吴文渊.
基于同态加密的线性系统求解方案
Linear System Solving Scheme Based on Homomorphic Encryption
计算机科学, 2022, 49(3): 338-345. https://doi.org/10.11896/jsjkx.201200124
[8] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[9] 金华, 朱靖宇, 王昌达.
视频隐私保护技术综述
Review on Video Privacy Protection
计算机科学, 2022, 49(1): 306-313. https://doi.org/10.11896/jsjkx.201200047
[10] 雷羽潇, 段玉聪.
面向跨模态隐私保护的AI治理法律技术化框架
AI Governance Oriented Legal to Technology Bridging Framework for Cross-modal Privacy Protection
计算机科学, 2021, 48(9): 9-20. https://doi.org/10.11896/jsjkx.201000011
[11] 董晓梅, 王蕊, 邹欣开.
面向推荐应用的差分隐私方案综述
Survey on Privacy Protection Solutions for Recommended Applications
计算机科学, 2021, 48(9): 21-35. https://doi.org/10.11896/jsjkx.201100083
[12] 孙林, 平国楼, 叶晓俊.
基于本地化差分隐私的键值数据关联分析
Correlation Analysis for Key-Value Data with Local Differential Privacy
计算机科学, 2021, 48(8): 278-283. https://doi.org/10.11896/jsjkx.201200122
[13] 张学军, 杨昊英, 李桢, 何福存, 盖继扬, 鲍俊达.
融合语义位置的差分私有位置隐私保护方法
Differentially Private Location Privacy-preserving Scheme withSemantic Location
计算机科学, 2021, 48(8): 300-308. https://doi.org/10.11896/jsjkx.200900198
[14] 陈天荣, 凌捷.
基于特征映射的差分隐私保护机器学习方法
Differential Privacy Protection Machine Learning Method Based on Features Mapping
计算机科学, 2021, 48(7): 33-39. https://doi.org/10.11896/jsjkx.201200224
[15] 王辉, 朱国宇, 申自浩, 刘琨, 刘沛骞.
基于用户偏好和位置分布的假位置生成方法
Dummy Location Generation Method Based on User Preference and Location Distribution
计算机科学, 2021, 48(7): 164-171. https://doi.org/10.11896/jsjkx.200800069
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!