Computer Science ›› 2021, Vol. 48 ›› Issue (8): 278-283.doi: 10.11896/jsjkx.201200122

• Information Security • Previous Articles     Next Articles

Correlation Analysis for Key-Value Data with Local Differential Privacy

SUN Lin, PING Guo-lou, YE Xiao-jun   

  1. School of Software,Tsinghua University,Beijing 100084,China
  • Received:2020-12-13 Revised:2021-05-13 Published:2021-08-10
  • About author:SUN Lin,born in 1993,Ph.D.His main research interests include privacy protection and data mining.(sunl16@mails.tsinghua.edu.cn)YE Xiao-jun,born in 1964,professor.His main research interests include cloud data management,data security and privacy,and database system testing.
  • Supported by:
    National Key Research and Development Program of China(2019QY1402).

Abstract: Crowdsourced data from distributed sources are routinely collected and analyzed to produce effective data-mining mo-dels in crowdsensing systems.Data usually contains personal information,which leads to possible privacy leakage in data collection and analysis.The local differential privacy (LDP) has been deemed as the de facto measure for trade-off between privacy guarantee and data utility.Currently,the key-value data is a kind of heterogeneous data types in which the key is categorical data and the value is numerical data.Achieving LDP for key-value data is challenging.This paper focuses on key-value data publishing and correlation analysis under the framework of LDP.Firstly,the frequency correlation and mean correlation in key-value data are defined.Then the indexing one-hot perturbation mechanism is proposed to provide LDP guarantees.At last,the correlation results can be estimated in the perturbed space.Theoretical analysis and experimental results on both real-word and synthetic dataset va-lidate the effectiveness of proposed mechanism.

Key words: Correlation analysis, frequency estimation, Key-value data, Local differential privacy, Mean estimation

CLC Number: 

  • TP391
[1]YANG G M,YANG J,ZHANG J P.Research on Data Publi-shing of Privacy Preserving[J].Computer Science,2011(9):17-23.
[2]WANG B,YANG J.Research on Anonymity Technique forPersonalization Privacy-preserving Data Publishing[J].Compu-ter Science,2012,39(4):168-171.
[3]DWORK C,MCSHERRY F,NISSIM K,et al.Calibrating noise to sensitivity in private data analysis[C]//Theory of Cryptography Conference.Heidelberg:Springer,2006:265-284.
[4]ABADI M,CHU A,GOODFELLOW I,et al.Deep learningwith differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Vienna:ACM,2016:308-318.
[5]PHAN N H,WANG Y,WU X,et al.Differential privacy preservation for deep auto-encoders:an application of human behavior prediction[C]//Proceedings of the AAAI Conference on Artificial Intelligence.Arizona:AAAI Press,2016:1309-1315.
[6]CHEN R,XIAO Q,ZHANG Y,et al.Differentially privatehigh-dimensional data publication via sampling-based inference[C]//Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Sydeny:ACM,2015:129-138.
[7]DUCHI J C,JORDAN M I,WAINWRIGHT M J.Local privacy and statistical minimax rates[C]//2013 IEEE 54th Annual Symposium on Foundations of Computer Science.Washington:IEEE,2013:429-438.
[8]BLOOM B H.Space/time trade-offs in hash coding with allowa-ble errors[J].Communications of the ACM,1970,13(7):422-426.
[9]ERLINGSSON U,PIHUR V,KOROLOVA A.RAPPOR:Randomized aggregatable privacy-preserving ordinal response[C]//Proceedings of the 2014 ACM SIGSAC Conference on Compu-ter and Communications Security.Arizona:ACM,2014:1054-1067.
[10]NGUYEN T T,XIAO X X,YANG Y,et al.Collecting and Ana-lyzing Data from Smart Device Users with Local Differential Privacy[EB/OL].https://arxiv.org/abs/1606.05053.
[11]WANG N,XIAO X K,YANG Y,et al.Collecting and Analyzing Multidimensional Data with Local Differential Privacy[C]//IEEE 35th International Conference on Data Engineering.Macau:IEEE Press,2019:638-649.
[12]QIN Z,YANG Y,YU T,et al.Heavy hitter estimation over set-valued data with local differential privacy[C]//Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security.Vienna:ACM,2016:192-203.
[13]SWANG T H,LI N H,JHA S.Locally differentially private frequent itemset mining[C]//2018 IEEE Symposium on Security and Privacy.San Francisco:IEEE Press,2018:127-143.
[14]BASSILY R,NISSIM K,STEMMER U,et al.Practical locally private heavy hitters[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York:Curran Associates Inc,2017:2285-2293.
[15]CORMODE G,KULKARNI T,SRIVASTAVA D.Marginalrelease under local differential privacy[C]//Proceedings of the 2018 International Conference on Management of Data.Houston:ACM,2018:131-146.
[16]YE Q Q,HU H B,MENG X F.PrivKV:Key-Value Data Collection with Local Differential Privacy[C]//2019 IEEE Symposium on Security and Privacy (SP).San Francisco:IEEE Press,2019:317-331.
[17]GU X L,LI M,CHENG Y Q,et al.PCKV:Locally Differentially Private Correlated Key-Value Data Collection with Optimized Utility[C]//29th USENIX Security Symposium.USENIX Association,2020:967-984.
[18]ZHANG J,CORMODE G,PROCOPIUC C M,et al.Privbayes:Private data release via bayesian networks[J].ACM Transactions on Database Systems,2017,42(4):1-41.
[19]BARAK B,CHAUDHURI K,DWORK C,et al.Privacy,accuracy,and consistency too:a holistic solution to contingency table release[C]//Proceedings of the Twenty-sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems.New York:ACM,2007:273-282.
[20]DING B L,KULKARNI J,YEKHANIN S.Collecting telemetry data privately[C]//Proceedings of the 31st International Conference on Neural Information Processing.California:ACM,2017:3574-3583.
[21]DUCHI J C,JORDAN M I.Minimax Optimal Procedures forLocally Private Estimation[J].Journal of the American Statistical Association,2018,113(521):182-201.
[22]WANG T H,BLOCKI J,LI N H,et al.Locally differentiallyprivate protocols for frequency estimation[C]//26th USENIX Security Symposium.Vancouver:USENIX Association,2017:729-745.
[1] YANG Xiao, WANG Xiang-kun, HU Hao, ZHU Min. Survey on Visualization Technology for Equipment Condition Monitoring [J]. Computer Science, 2022, 49(7): 89-99.
[2] LI Si-quan, WAN Yong-jing, JIANG Cui-ling. Multiple Fundamental Frequency Estimation Algorithm Based on Generative Adversarial Networks for Image Removal [J]. Computer Science, 2022, 49(3): 179-184.
[3] PENG Chun-chun, CHEN Yan-li, XUN Yan-mei. k-modes Clustering Guaranteeing Local Differential Privacy [J]. Computer Science, 2021, 48(2): 105-113.
[4] ZHANG Qin, CHEN Hong-mei, FENG Yun-fei. Overlapping Community Detection Method Based on Rough Sets and Density Peaks [J]. Computer Science, 2020, 47(5): 72-78.
[5] LI Gang, WANG Chao, HAN De-peng, LIU Qiang-wei, LI Ying. Study on Multimodal Image Genetic Data Based on Deep Principal Correlated Auto-encoders [J]. Computer Science, 2020, 47(4): 60-66.
[6] DAI Huan, JIANG Jing-jing, SHU Qin-dong, SHI Peng-zhan, SHI Wen-hua. Vital Signs Monitoring Method Based on Channel State Phase Information [J]. Computer Science, 2020, 47(10): 48-54.
[7] LU Xian-guang, DU Xue-hui, WANG Wen-juan. Alert Correlation Algorithm Based on Improved FP Growth [J]. Computer Science, 2019, 46(8): 64-70.
[8] RU Feng, XU Jin, CHANG Qi, KAN Dan-hui. High Order Statistics Structured Sparse Algorithm for Image Genetic Association Analysis [J]. Computer Science, 2019, 46(4): 66-72.
[9] LIAO Bin, ZHANG Tao, LI Min, YU Jiong, GUO Bing-lei, LIU Yan. Consistency Checking Algorithm for Distributed Key-Value Database Based on Operation History Graph [J]. Computer Science, 2019, 46(12): 213-219.
[10] CHEN Zheng, TIAN Bo, HE Zeng-you. PPI Network Inference Algorithm for PCP-MS Data [J]. Computer Science, 2019, 46(12): 313-321.
[11] CHEN Feng, MENG Zu-qiang. Study on Heterogeneous Multimodal Data Retrieval Based on Hash Algorithm [J]. Computer Science, 2019, 46(10): 49-54.
[12] CHEN Li-li, ZHU Feng, SHENG Bin, CHEN Zhi-hua. Quality Evaluation of Color Image Based on Discrete Quaternion Fourier Transform [J]. Computer Science, 2018, 45(8): 70-74.
[13] LI Guang-pu, HUANG Miao-hua. Research Progress and Mainstream Methods of Frequent Itemsets Mining [J]. Computer Science, 2018, 45(11A): 1-11.
[14] WU Jun and WANG Chun-zhi. Multiple Correlation Analysis and Application of Granular Matrix Based on Big Data [J]. Computer Science, 2017, 44(Z11): 407-410.
[15] CUI Hong-fei, LIU Jia, GU Jing-jing and ZHUANG Yi. 3D Localization Estimation Algorithm Based on Locality Preserving Canonical Correlation Analysis in Wireless Sensor Networks [J]. Computer Science, 2017, 44(9): 105-109.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!