计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 278-283.doi: 10.11896/jsjkx.201200122

• 信息安全 • 上一篇    下一篇


孙林, 平国楼, 叶晓俊   

  1. 清华大学软件学院 北京100084
  • 收稿日期:2020-12-13 修回日期:2021-05-13 发布日期:2021-08-10
  • 通讯作者: 叶晓俊(yexj@tsinghua.edu.cn)
  • 基金资助:

Correlation Analysis for Key-Value Data with Local Differential Privacy

SUN Lin, PING Guo-lou, YE Xiao-jun   

  1. School of Software,Tsinghua University,Beijing 100084,China
  • Received:2020-12-13 Revised:2021-05-13 Published:2021-08-10
  • About author:SUN Lin,born in 1993,Ph.D.His main research interests include privacy protection and data mining.(sunl16@mails.tsinghua.edu.cn)YE Xiao-jun,born in 1964,professor.His main research interests include cloud data management,data security and privacy,and database system testing.
  • Supported by:
    National Key Research and Development Program of China(2019QY1402).

摘要: 在群智感知系统中,从分布式数据源中持续收集和分析数据可以为先进的数据挖掘模型提供决策支持。由于数据中可能包含个人相关的信息,数据的采集和分析过程中通常伴随着隐私泄露的风险。本地化差分隐私作为先进的隐私保护方案可在用户的隐私性和数据的可用性之间提供较好的权衡。当前,键值数据作为异构类型数据,其同时含有分类数据和数值数据,基于本地化差分隐私在多维度下对键值数据进行关联分析面临着一定的挑战。针对隐私保护前提下键值数据的发布和关联分析问题,首先定义了键值数据的频率关联和均值关联问题,然后提出了适用于键值对的索引独热编码,为键值数据提供本地化差分隐私保护,最后在扰动的数据上对键值数据进行关联分析。基于仿真数据集和真实数据集的实验和理论分析验证了所提方案的有效性。

关键词: 本地化差分隐私, 关联分析, 键值数据, 均值估计, 频率估计

Abstract: Crowdsourced data from distributed sources are routinely collected and analyzed to produce effective data-mining mo-dels in crowdsensing systems.Data usually contains personal information,which leads to possible privacy leakage in data collection and analysis.The local differential privacy (LDP) has been deemed as the de facto measure for trade-off between privacy guarantee and data utility.Currently,the key-value data is a kind of heterogeneous data types in which the key is categorical data and the value is numerical data.Achieving LDP for key-value data is challenging.This paper focuses on key-value data publishing and correlation analysis under the framework of LDP.Firstly,the frequency correlation and mean correlation in key-value data are defined.Then the indexing one-hot perturbation mechanism is proposed to provide LDP guarantees.At last,the correlation results can be estimated in the perturbed space.Theoretical analysis and experimental results on both real-word and synthetic dataset va-lidate the effectiveness of proposed mechanism.

Key words: Correlation analysis, frequency estimation, Key-value data, Local differential privacy, Mean estimation


  • TP391
