Computer Science ›› 2022, Vol. 49 ›› Issue (2): 162-173.doi: 10.11896/jsjkx.201200008

• Database & Big Data & Data Science • Previous Articles     Next Articles

Review of K-means Algorithm Optimization Based on Differential Privacy

KONG Yu-ting, TAN Fu-xiang, ZHAO Xin, ZHANG Zheng-hang, BAI Lu, QIAN Yu-rong   

  1. College of Software,Xinjiang University,Urumqi 830000,China
    Key Laboratory of Signal Detection & Processing in Xinjiang Autonomous Region,Xinjiang University,Urumqi 830046,China
    Key Laboratory of Software Engineering,Xinjiang University,Xinjiang University,Urumqi 830000,China
  • Received:2020-12-01 Revised:2021-03-29 Online:2022-02-15 Published:2022-02-23
  • About author:KONG Yu-ting,born in 1997,postgra-duate,is a member of China Computer Federation.Her main research interests include data mining and data security.
    QIAN Yu-rong,born in 1980,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.Her main research interests include network computing and remote sensing image processing.
  • Supported by:
    National Natural Science Foundation of China(61966035),International Cooperation Project of the Science and Technology Department of the Autonomous Region(2020E01023) and Autonomous Region Graduate Research Innovation Project(XJ2019G072).

Abstract: Differential privacy K-means algorithm (DP K-means),as a kind of privacy preserving data mining (PPDM) model based on differential privacy technology,has attracted much attention from researchers because of its simplicity,efficiency and ability to guarantee data privacy.Firstly,the principle and privacy attack model of differential privacy K-means Algorithm are described,and the shortcomings of the algorithm are analyzed.Then,the advantages and disadvantages of the improvement research of DP K-means algorithm are discussed and analyzed from three perspectives,including data preprocessing,privacy budget allocation and cluster partition,and the relevant data sets and common evaluation indexes in the research are summarized.At last,the challenging problems to be solved in the improvement research of DP K-means algorithm are pointed out,and the future development trend of DP K-means algorithm is prospected.

Key words: Differential privacy, Differential privacy K-means algorithm, Privacy preservation, Privacy preserving data mining

CLC Number: 

  • TP309
[1]GAO Z,SUN Y,CUI X,et al.Privacy-Preserving Hybrid K-Means[J].International Journal of Data Warehousing and Mi-ning,2018,14(2):1-17.
[2]NELSON B,OLOVSSON T.Security and privacy for big data:A systematic literature review[C]//IEEE International Confe-rence on Big Data.IEEE,2017.
[3]ZHANG X J,YANG H Y,LI Z,et al.Differentially Private Location Privacy-preserving Scheme with Semantic Location[J].Computer Science,2021,48(8):300-308.
[4]PENG C C,CHEN Y L,XUN Y M.k-modes Clustering Gua-ranteeing Local Differential Privacy[J].Computer Science,2021,48(2):105-113.
[5]ZHOU S G,LI F,TAO Y F,et al.Privacy Preservation in Database Applications:A Survey[J].Chinese Journal of Computers,2009,32(5):847-861.
[6]MIN Z,YANG G,SANGAIAH A K,et al.A privacy protection-oriented parallel fully homomorphic encryption algorithm in cyber physical systems[J].EURASIP Journal on Wireless Communications and Networking,2019,2019(1):15.
[7]LIU J,YIN S L,LI H,et al.A density-based clustering method for k-anonymity privacy protection[J].Journal of Information Hiding and Multimedia Signal Processing,2017,8(1):12-18.
[8]TEMUUJIN O,AHN J,IM D H.Efficient L-Diversity algo-rithm for preserving privacy of dynamically published datasets[J].IEEE Access,2019,197:122878-122888.
[9]DWORK C.Differential Privacy:A Survey of Results[C]//Theory and Applications of Models of Computation(TAMC 2008).Springer,2008:1-19.
[10]FANG Y,ZHU J,ZHOU W,et al.A Survey on Data Mining Privacy Protection Algorithms[J].Netinfo Security,2017,2:6-11.
[11]WANG J Y,LIU C,FU X C,et al.Crucial Patterns Mining with Differential Privacy over Data Streams[J].Journal of Software,2019,30(3):648-666.
[12]INAN A,GURSOY M E,SAYGIN Y.Sensitivity analysis fornon-interactive differential privacy:bounds and efficient algorithms[J].IEEE Transactions on Dependable and Secure Computing,2020,17(1):194-207.
[13]GAO Z Q,WANG Y T.Survey on differential privacy and its progress[J].Journal on Communications,2017,38:151-155.
[14]BLUM A,DWORK C,MCSHERRY F,et al.Practical Privacy:The SuLQ Framework[C]//Proceedings of the Twenty-fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems(PODS '05).2016:128-138.
[15]MCSHERRY F D.Privacy integrated queries:an extensible platform for privacy-preserving data analysis[C]//Proceedings of the 2009 ACM SIGMOD International Conference on Management of data.ACM,2019:19-30.
[16]NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]//Thirty-ninth ACM Symposium on Theory of Computing.ACM,2007.
[17]FELDMAN D,FIAT A,KAPLAN H,et al.Private coresets[C]//Proceedings of the 41st Annual ACM Symposium on Theory of Computing,STOC 2009.Bethesda,MD,USA:ACM,2009.
[18]MOHAN P,THAKURTA A,SHI E,et al.GUPT:privacy preserving data analysis made easy[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data.2012:349-360.
[19]FLETCHER S,ISLAM M Z.Decision tree classification withdifferential privacy:A survey[J].ACM Computing Surveys,2019,52(4):1-33.
[20]HAY M,MACHANAVAJJHALA A,MIKLAU G,et al.Principled Evaluation of Differentially Private Algorithms using DPBench[C]//ACM.2016:139-154.
[21]YE Q Q,MENG X F,ZHU M J,et al.Survey on Local Differential Privacy[J].Journal of Software,2018,29(7):1981-2005.
[22]HE X M,WANG X Y,CHEN H H.Study on choosing the parameter ε in differential privacy[J].Journal on Communications,2015,36(12):124-130.
[23]CONSUL S,WILLIAMSON S.Differentially Private MedianForests for Regression and Classification[J].arXiv:2006.08795.
[24]ALVIM M,CHATZIKOKOLAKIS K,PALAMIDESSI C,et al.Invited Paper:Local Differential Privacy on Metric Spaces:Optimizing the Trade-Off with Utility[C]//2018 IEEE 31st Computer Security Foundations Symposium (CSF).IEEE,2018:262-267.
[25]BUN M,STEINKE T.Average-Case Averages:Private Algo-rithms for Smooth Sensitivity and Mean Estimation[J].Advances in Neural Information Processing Systems,2019,17:181-191.
[26]WANG H,XU Z,XIONG L,et al.Conducting Correlated La-place Mechanism for Differential Privacy[C]//International Conference on Cloud Computing and Security.Cham:Springer,2017.
[27]DONG J,DURFEE D,ROGERS R.Optimal Differential Privacy Composition for Exponential Mechanisms and the Cost of Adaptivity[J].arXiv:1909.13830.
[28]YANG X,WANG T,REN X,et al.Survey on Improving Data Utility in Differentially Private Sequential Data Publishing[J].IEEE Transactions on Big Data,2021,7(4):729-749.
[29]LIU F.Generalized Gaussian Mechanism for Differential Privacy[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(4):747-756.
[30]TALWAR K,HARDT M A W.Geometric mechanism for privacy-preserving answers:U.S.Patent 8,661,047[P].2014-2-25.
[31]LI C,MIKLAU G,HAY M,et al.The matrix mechanism:optimizing linear counting queries under differential privacy[J].VLDB Journal — the International Journal on Very Large Data Bases,2015,24(6):757-781.
[32]ZHAO J,CHEN Y,ZHANG W.Differential Privacy Preservation in Deep Learning:Challenges,Opportunities and Solutions[J].IEEE Access,2019,7:48901-48911.
[33]GONG M,XIE Y,PAN K,et al.A Survey on Differentially Private Machine Learning[J].IEEE Computational Intelligence Magazine,2020,15(2):49-64.
[34]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of cryptography conference.Springer,2006:265-284.
[35]LIN W C,TSAI C F,KE S W,et al.Top 10 data mining techniques in business applications:a brief survey[J].Kybernetes,2017,46(7):1158-1170.
[36]XIONG J,REN J,CHEN L,et al.Enhancing privacy and availability for data clustering in intelligent electrical service of IoT[J].IEEE Internet of Things Journal,2018,6(2):1530-1540.
[37]DWORK C,ROTHBLUM G N,VADHAN S P.Boosting andDifferential Privacy[C]//2010 IEEE 51st Annual Symposium on Foundations of Computer Science.IEEE,2010:51-60.
[38]YANG J C,ZHAO C.A Survey on K-Means Clustering Algorithm[J].Computer Engineering and Applications,2019,55(23):7-14,63.
[39]REN Y H.Survey of K-means algorithms on big data[J/OL].Application Research of Computers.https://doi.org/10.19734/j.issn.1001-3695.2019.10.0581.
[40]XING Y N,QIAN Y R,NAN F Z,et al.Survey of optimization on K-means algorithm in Spark[J].2020,37(3):641-647.
[41]LI Y,HAO Z F,WEN W,et al.Research on Differential Privacy Preserving k-means Clustering[J].Computer Science,2013,3:287-290.
[42]REN J,XIONG J,YAO Z,et al.DPLK-Means:A Novel Diffe-rential Privacy K-Means Mechanism[C]//IEEE Second International Conference on Data Science in Cyberspace.IEEE,2017.
[43]ZHANG G,ZHANG C,ZHANG H.Improved K-means Algo-rithm Based on Density Canopy[J].Knowledge-Based Systems,2018,145:289-297.
[44]XIA D,NING F,HE W.Research on Parallel Adaptive Canopy-K-Means Clustering Algorithm for Big Data Mining Based on Cloud Platform[J].Journal of Grid Computing,2020,18(2):263-273.
[45]LI L F.The analysis of K-means Clustering with Differential Privacy[D].Sichuan:Southwest Jiaotong University,2016.
[46]YAO S.An Improved Differential Privacy K-Means Algorithm Based on MapReduce[C]//2018 11th International Symposium on Computational Intelligence and Design (ISCID).IEEE,2018:141-145.
[47]SHANG T,ZHAO Z,GUAN Z,et al.A DP canopy k-means algorithm for privacy preservation of Hadoop Platform[C]//International Symposium on Cyberspace Safety and Security.Cham:Springer,2017:189-198.
[48]SHAO R M,ZHANG L,LIU Y,et al.A-PAM Clustering Algorithm Based on Differential Privacy Preserving[C]//2015 International Conference on Software,Multimedia and Communication Engineering.2015.
[49]YU Q,LUO Y,CHEN C,et al.Outlier-eliminated k-means clustering algorithm based on differential privacy preservation[J].Applied Intelligence,2016,45(4):1179-1191.
[50]FAN Y K,LIU J W.Parallel K-means algorithm with differen-tial privacy preservation and outlier pruning[J].Application Research of Computers,2019,6:1776-1781.
[51]MENG Q,ZHOU L.Research On Differential Privacy Preserving Clustering Algorithm Based On Spark Platform[J].Journal of Computers (Taiwan),2018,29(1):47-62.
[52]DWORK C.Differential privacy[C]//International Colloquiumon Automata,Languages,and Programming.Springer,2006:1-12.
[53]FAN Z,XU X.APDP k-means:A New Differential PrivacyClustering Algorithm Based on Arithmetic Progression Privacy Budget Allocation[C]//2019 IEEE 21st International Confe-rence on High Performance Computing and Communications.IEEE,2019.
[54]SU D,CAO J,LI N,et al.Differentially Private K-Means Clustering[C]//ACM.2016:26-37.
[55]FU Y M,LI Z D.Research on k-means++ Clustering Algorithm Based on Laplace Mechanism for Differential Privacy Protection[J].Netinfo Security,2019,218(2):49-58.
[56]ZHANG Y,LIU N,WANG S,et al.A differential privacy protecting K-means clustering algorithm based on contour coefficients[J].Plos One,2018,13(11):1-15.
[57]MO R,LIU J,YU W,et al.A Differential Privacy-Based Protecting Data Preprocessing Method for Big Data Mining[C]//2019 18th IEEE International Conference on Trust,Security And Privacy In Computing And Communications.IEEE,2019.
[58]DONG S U,CAO J N E,NINGHUI L I,et al.Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization[J].ACM Transaction on Information & System Security,2017,20(4):16.1-16.33.
[59]ZHANG J,XIAO X,YANG Y,et al.PrivGene:differentially private model fitting using genetic algorithms[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Ma-nagement of Data.2013:665-676.
[60]LU Z,SHEN H.A convergent differentially private k-means clustering algorithm[C]//Pacific-Asia Conference on Know-ledge Discovery and Data Mining.Springer,2019:612-624.
[61]LU Z,SHEN H.Differentially Private k-Means Clustering with Guaranteed Convergence[J].arXiv:2002.01043,2020.
[62]ACS G,MELIS L,CASTELLUCCIA C,et al.Differentially private mixture of generative neural networks[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(6):1109-1121.
[63]LV Z,WANG L,GUAN Z,et al.An Optimizing and Differen-tially Private Clustering Algorithm for Mixed Data in SDN based Smart Grid[J].IEEE Access,2019,7:45773-45782.
[64]NGUYEN H H.Privacy-Preserving Mechanisms for k-ModesClustering[J].Computers & Security,2018,78:60-75.
[65]NI T,QIAO M,CHEN Z,et al.Utility-efficient Differentially Private K-means Clustering based on Cluster Merging-ScienceDirect[J].Neurocomputing,2021,424:205-214.
[66]GAO Z Q,ZHANG L J.DPHKMS:An Efficient Hybrid Clus-tering Preserving Differential Privacy in Spark[C]//Internatio-nal Conference on Emerging Internetworking.Cham:Springer,2017.
[67]FRANK A,ASUNCION A.UCI machine learning repository[EB/OL]. [2022-01-13]. https://archive.ics.uci.edu/ml/index.php.
[68]PENG H L,JIN K Z,FU C C,et al.Private Time Series Pattern Mining with Sequential Lattice[J].Acta Electronica Sinica,2020,48(1):153-163.
[69]LI H C,WU X P,CHEN Y.k-means clustering method preserving differential privacy in MapReduce framework[J].Journal on Communications,2016,37(2):124-130.
[70]XIA C,HUA J,TONG W,et al.Distributed K-Means clustering guaranteeing local differential privacy[J].Computers & Security,2020,90:101699.1-101699.11.
[71]NGUYEN T D,GUPTA S,RANA S,et al.Privacy Aware K-Means Clustering with High Utility[C]//Pacific-Asia Confe-rence on Knowledge Discovery & Data Mining.Springer International Publishing,2016:388-400.
[72]ZHANG T,ZHU T,XIONG P,et al.Correlated Differential Privacy:Feature Selection in Machine Learning[J].IEEE Transactions on Industrial Informatics,2019,16(3):2115-2124.
[73]RATHI M,RAJAVAT A.High Dimensional Data Processing in Privacy Preserving Data Mining[C]//2020 IEEE 9th International Conference on Communication Systems and Network Technologies (CSNT).IEEE,2020.
[74]ZHANG T,ZHU T,LIU R,et al.Correlated Data in Differential Privacy:Definition and Analysis[J].Concurrency and Computation:Practice and Experience,2020:e6015.
[75]TRAN H Y,HU J.Privacy-preserving big data analytics a comprehensive survey[J].Journal of Parallel and Distributed Computing,2019,134:207-218.
[76]GUAN Z,LV Z,DU X,et al.Achieving Data Utility-PrivacyTradeoff in Internet of Medical Things:A Machine Learning Approach[J].Future Generation Computer Systems,2019,98:60-68.
[77]SAKELLARIOU G,GOUNARIS A.Homomorphically encryp-ted k-means on cloud-hosted servers with low client-side load[J].Computing,2019,101(12):1813-1836.
[1] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[2] HUANG Jue, ZHOU Chun-lai. Frequency Feature Extraction Based on Localized Differential Privacy [J]. Computer Science, 2022, 49(7): 350-356.
[3] WANG Mei-shan, YAO Lan, GAO Fu-xiang, XU Jun-can. Study on Differential Privacy Protection for Medical Set-Valued Data [J]. Computer Science, 2022, 49(4): 362-368.
[4] DONG Xiao-mei, WANG Rui, ZOU Xin-kai. Survey on Privacy Protection Solutions for Recommended Applications [J]. Computer Science, 2021, 48(9): 21-35.
[5] SUN Lin, PING Guo-lou, YE Xiao-jun. Correlation Analysis for Key-Value Data with Local Differential Privacy [J]. Computer Science, 2021, 48(8): 278-283.
[6] ZHANG Xue-jun, YANG Hao-ying, LI Zhen, HE Fu-cun, GAI Ji-yang, BAO Jun-da. Differentially Private Location Privacy-preserving Scheme withSemantic Location [J]. Computer Science, 2021, 48(8): 300-308.
[7] CHEN Tian-rong, LING Jie. Differential Privacy Protection Machine Learning Method Based on Features Mapping [J]. Computer Science, 2021, 48(7): 33-39.
[8] WANG Le-ye. Geographic Local Differential Privacy in Crowdsensing:Current States and Future Opportunities [J]. Computer Science, 2021, 48(6): 301-305.
[9] PENG Chun-chun, CHEN Yan-li, XUN Yan-mei. k-modes Clustering Guaranteeing Local Differential Privacy [J]. Computer Science, 2021, 48(2): 105-113.
[10] WANG Mao-ni, PENG Chang-gen, HE Wen-zhu, DING Xing, DING Hong-fa. Privacy Metric Model of Differential Privacy via Graph Theory and Mutual Information [J]. Computer Science, 2020, 47(4): 270-277.
[11] WU Ying-jie, HUANG Xin, GE Chen, SUN Lan. Adaptive Parameter Optimization for Real-time Differential Privacy Streaming Data Publication [J]. Computer Science, 2019, 46(9): 99-105.
[12] LI Lan, YANG Chen, WANG An-fu. Study on Selection of Privacy Parameters ε in Differential Privacy Model [J]. Computer Science, 2019, 46(8): 201-205.
[13] HU Chuang, YANG Geng, BAI Yun-lu. Clustering Algorithm in Differential Privacy Preserving [J]. Computer Science, 2019, 46(2): 120-126.
[14] LI Sen-you, JI Xin-sheng, YOU Wei, ZHAO Xing. Hierarchical Control Strategy for Data Querying Based on Differential Privacy [J]. Computer Science, 2019, 46(11): 130-136.
[15] WANG Jing, SI Shu-jian. Attribute Revocable Access Control Scheme for Brain-Computer Interface Technology [J]. Computer Science, 2018, 45(9): 187-194.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!