Computer Science ›› 2021, Vol. 48 ›› Issue (2): 105-113.doi: 10.11896/jsjkx.200700172

• Database & Big Data & Data Science • Previous Articles     Next Articles

k-modes Clustering Guaranteeing Local Differential Privacy

PENG Chun-chun, CHEN Yan-li, XUN Yan-mei   

  1. College of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210003,China
  • Received:2020-07-25 Revised:2020-09-01 Online:2021-02-15 Published:2021-02-04
  • About author:PENG Chun-chun,born in 1996,postgraduate.His main research interests include privacy preserving and data mining.
    CHEN Yan-li,born in 1969,Ph.D,professor.Her main research interests include network security and computer architecture.
  • Supported by:
    The National Natural Science Foundation of China(61572263,61272084).

Abstract: How to conduct usability data mining while protecting data privacy has become a hot issue.In many practical scena-rios,it is difficult to find a trusted third party to process the sensitive data.This paper proposes the first locally differentially private k-modes mechanism(LDPK-modes) under this distributed scenario.Differing from standard differentially private clustering mechanisms,the proposed mechanism doesn't need any trusted third party to collect and preprocess users data.Users disturb their data using a random response mechanism that satisfies the definition of local d-privacy (local differential privacy with distance metric).When the third party collects the user's disturbed data,it restores its statistical features and generates a synthetic data set.The frequent attributes on the data set are assigned to the initial cluster center and then start k-modes clustering.Theoretical analysis shows that the proposed algorithm satisfies local d-privacy.Experimental results show that our proposal can well preserve the quality of clustering results without a trusted third-party data collector.

Key words: Clustering, d-privacy, k-modes, Local differential privacy, Privacy preserving

CLC Number: 

  • TP309
[1] DWORK C.Differential privacy:A survey of results[C]//International Conference on Theory and Applications of Models of Computation.Springer,Berlin,Heidelberg,2008:1-19.
[2] SWEENEY L.k-anonymity:A model for protecting privacy[J].International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2002,10(5):557-570.
[3] MACHANAVAJJHALA A,KIFER D,GEHRKE J,et al.l-diversity:Privacy beyond k-anonymity[J].ACM Transactions on Knowledge Discovery from Data (TKDD),2007,1(1):3.
[4] YE Q Q,MENG X F,ZHU M J,et al.Survey on local differen-tial privacy[J].Journal of Software,2018,29(7):1981-2005.
[5] DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy and statistical minimax rates[C]//2013 IEEE 54th Annual Symposium on Foundations of Computer Science.IEEE,2013:429-438.
[6] KASIVISWANATHAN S P,LEE H K,NISSIM K,et al.What can we learn privately[C]// Proc.of the 49th Annual IEEE Symp.on Foundations of Computer Science (FOCS).IEEE,2008:531-540.
[7] HOPE T,CHAN J,KITTUR A,et al.Accelerating innovation through analogy mining[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:235-243.
[8] MENDES R,VILELA J P.Privacy-preserving data mining:methods,metrics,and applications[J].IEEE Access,2017,5:10562-10582.
[9] HAMMING R W.Error detecting and error correcting codes[J].The Bell System Technical Journal,1950,29(2):147-160.
[10] GU X,LI M,CAO Y,et al.Supporting both range queries and frequency estimation with local differential privacy[C]//2019 IEEE Conference on Communications and Network Security (CNS).IEEE,2019:124-132.
[11] WARNER S L.Randomized response:A survey technique foreliminating evasive answer bias[J].Journal of the American Statistical Association,1965,60(309):63-69.
[12] ERLINGSSON Ú,PIHUR V,KOROLOVA A.Rappor:Randomized aggregatable privacy-preserving ordinal response[C]//Proceedings of the 2014 ACM SIGSAC Conference on Compu-ter and Communications Security.2014:1054-1067.
[13] BLOOM B H.Space/Time trade-offs in hash coding with allowable errors[J].Communications of the ACM,1970,13(7):422-426.
[14] BASSILY R,SMITH A.Local,private,efficient protocols forsuccinct histograms[C]//Proc.of the 47th Annual ACM on Symp.on Theory of Computing.ACM,2015:127-135.
[15] DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy,data processing inequalities,and statistical minimax rates[J].arXiv:1302.3203,2013.
[16] WAINWRIGHT M J,JORDAN M I,DUCHI J C.Privacy aware learning[C]//Advances in Neural Information Processing Systems.2012:1430-1438.
[17] NGUYÊN T T,XIAO X,YANG Y,et al.Collecting and analyzing data from smart device users with local differential privacy[J].arXiv:1606.05053,2016.
[18] BLUM A,DWORK C,MCSHERRY F,et al.Practical privacy:the SuLQ framework[C]//Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.2005:128-138.
[19] REN J,XIONG J,YAO Z,et al.DPLK-means:A novel Differential Privacy K-means Mechanism[C]//2017 IEEE Second International Conference on Data Science in Cyberspace (DSC).IEEE,2017:133-139.
[20] FU Y M,LI Z Z.Research on k-means++ Clustering Algorithm Based on Laplace Mechanism for Differential Privacy Protection[J].Netinfo Security,2019,19(2):43-52.
[21] HU C,YANG G,BAI Y L.Clustering Algorithm in Differential Privacy Preserving[J].Computer Science,2019,46(2):120-126.
[22] XIA C,HUA J,TONG W,et al.Distributed K-Means clustering guaranteeing local differential privacy[J].Computers & Security,2020,90:1-11.
[23] NGUYEN H H.Privacy-preserving mechanisms for k-modesclustering[J].Computers & Security,2018,78:60-75.
[24] LYU Z,WANG L,GUAN Z,et al.An optimizing and differentially private clustering algorithm for mixed data in SDN-based smart grid[J].IEEE Access,2019,7:45773-45782.
[25] WANG T,BLOCKI J,LI N,et al.Locally differentially private protocols for frequency estimation[C]//26th Security Sympo-sium (Security 17).2017:729-745.
[26] NEWEY K W,MCFADDEN D.Large sample estimation andhypothesis[J].Handbook of Econometrics,1994,4:2111-2245.
[27] NISSIM K,RASKHODNIKOVA S,SMITH A.Smooth sensitivity and sampling in private data analysis[C]//Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing.2007:75-84.
[28] JIANG H,YI S,LI J,et al.Ant clustering algorithm with K-harmonic means clustering[J].Expert Systems with Applications,2010,37(12):8679-8684.
[1] LU Chen-yang, DENG Su, MA Wu-bin, WU Ya-hui, ZHOU Hao-hao. Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients [J]. Computer Science, 2022, 49(9): 183-193.
[2] TANG Ling-tao, WANG Di, ZHANG Lu-fei, LIU Sheng-yun. Federated Learning Scheme Based on Secure Multi-party Computation and Differential Privacy [J]. Computer Science, 2022, 49(9): 297-305.
[3] CHAI Hui-min, ZHANG Yong, FANG Min. Aerial Target Grouping Method Based on Feature Similarity Clustering [J]. Computer Science, 2022, 49(9): 70-75.
[4] YU Shu-hao, ZHOU Hui, YE Chun-yang, WANG Tai-zheng. SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion [J]. Computer Science, 2022, 49(6A): 256-260.
[5] MAO Sen-lin, XIA Zhen, GENG Xin-yu, CHEN Jian-hui, JIANG Hong-xia. FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition [J]. Computer Science, 2022, 49(6A): 285-290.
[6] CHEN Jing-nian. Acceleration of SVM for Multi-class Classification [J]. Computer Science, 2022, 49(6A): 297-300.
[7] WANG Jian. Back-propagation Neural Network Learning Algorithm Based on Privacy Preserving [J]. Computer Science, 2022, 49(6A): 575-580.
[8] CHEN Jia-zhou, ZHAO Yi-bo, XU Yang-hui, MA Ji, JIN Ling-feng, QIN Xu-jia. Small Object Detection in 3D Urban Scenes [J]. Computer Science, 2022, 49(6): 238-244.
[9] Ran WANG, Jiang-tian NIE, Yang ZHANG, Kun ZHU. Clustering-based Demand Response for Intelligent Energy Management in 6G-enabled Smart Grids [J]. Computer Science, 2022, 49(6): 44-54.
[10] XING Yun-bing, LONG Guang-yu, HU Chun-yu, HU Li-sha. Human Activity Recognition Method Based on Class Increment SVM [J]. Computer Science, 2022, 49(5): 78-83.
[11] ZHU Zhe-qing, GENG Hai-jun, QIAN Yu-hua. Line-Segment Clustering Algorithm for Chemical Structure [J]. Computer Science, 2022, 49(5): 113-119.
[12] ZHANG Yu-jiao, HUANG Rui, ZHANG Fu-quan, SUI Dong, ZHANG Hu. Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization [J]. Computer Science, 2022, 49(5): 165-169.
[13] ZUO Yuan-lin, GONG Yue-jiao, CHEN Wei-neng. Budget-aware Influence Maximization in Social Networks [J]. Computer Science, 2022, 49(4): 100-109.
[14] HAN Jie, CHEN Jun-fen, LI Yan, ZHAN Ze-cong. Self-supervised Deep Clustering Algorithm Based on Self-attention [J]. Computer Science, 2022, 49(3): 134-143.
[15] LYU You, WU Wen-yuan. Linear System Solving Scheme Based on Homomorphic Encryption [J]. Computer Science, 2022, 49(3): 338-345.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!