计算机科学 ›› 2016, Vol. 43 ›› Issue (12): 195-199.doi: 10.11896/j.issn.1002-137X.2016.12.035

• 数据挖掘 • 上一篇    下一篇

基于隐私保护的序列模式挖掘

方炜炜,谢伟,黄宏博,夏红科   

  1. 北京信息科技大学计算中心 北京100192;清华大学经济管理学院 北京100010,清华大学经济管理学院 北京100010,北京信息科技大学计算中心 北京100192,北京信息科技大学计算中心 北京100192
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金重点项目(60675030),国家自然科学基金项目(60875029),2015年北京市委组织部优秀人才培养项目,2016年北京教育委员会科技面上项目资助

Sequential Pattern Mining Based on Privacy Preserving

FANG Wei-wei, XIE Wei, HUANG Hong-bo and XIA Hong-ke   

  • Online:2018-12-01 Published:2018-12-01

摘要: 隐私保护是当前数据挖掘领域的一个研究热点,其目标是在不暴露原始数据信息的前提下准确地实现挖掘任务。针对隐私保护序列模式挖掘问题,提出了项集的布尔集合关系概念,设计了基于随机集和扰乱函数对原始序列库进行数据干扰的方法模型,并通过扰乱函数的特性还原出原始序列库的频繁序列模式的真实支持度,完成了在保护原始数据隐私的前提下准确地挖掘出频繁序列模式的任务。理论分析和实验结果表明,该方法模型具有很好的数据隐私保护性、挖掘结果准确性和算法执行高效性。

关键词: 序列模式,数据挖掘,隐私保护,数据干扰

Abstract: Privacy-preserving is one of the most important topics in data mining.Its’ main aim is realizing mining task in the context of uncovering original data information.In this paper,aiming to solve privacy-preserving sequential pattern mining problem, we proposed new concepts about item’s Boolean set relationship,and designed data perturbation method based on random set and random function,which can obtain the support of original sequential database.Theore-tical analysis and experiment results demonstrate that this method can achieve good performance in terms of privacy preserving,mining quality and efficiency.

Key words: Sequential pattern,Data mining,Privacy preserving,Data perturbation

[1] Agrawal R,Srikant R.Mining Sequential patterns[C]∥Proceeding of the 11th International Conference on Data Enginee-ring.Los Alamitos,CA:IEEE Computer Society Press,1995:3-14
[2] Srikant R,Agrawal R.Mining sequential patterns:Generaliza-tions and Performance Improvements[C]∥Proceeding of the 5th International Conference on Extending Database Technology.Berlin:Springer-Verlag.1996:3-17
[3] Han J,Pei J,et al.FreeSpan:Frequent Pattern-projected Sequ-ential Pattern Mining[C]∥Proceeding of the 6th International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2000:335-359
[4] Pei J,Han J,et al.PrefixSpan:Mining Sequential Patterns Effectively by Prefix Protected Pattern Growth[C]∥Proceeding of the 17th International Conference on Data Engineering.Los Alamitos,CA:IEEE Computer Society Press.2001:215-224
[5] Guralnik V,Garg N,Karypis G.Parallel Tree Projection Algorithm for Sequence Mining[C]∥LNCS2150.2001:310-320
[6] Agrawal R,Srikant R.Privacy-preserving data mining [C]∥Pro-ceedings of the 2000 ACM SIGMOD International Conference on Management of Data.Dallas,Texas,United States:ACM,2000:439-450
[7] Rizvi S J,Haritsa J R.Maintaining data privacy in associationrule mining[C]∥Proceedings of the 28th International Confe-rence on Very Large Databases (VLD).Hong Kong,China,2002:682-693
[8] Saygin Y,Verykios V S,Elmagarmid A K.Privacy preserving association rule mining[C]∥Proc.of the 12th International Workshop on Research Issues in Data Engineering (RIDE).San Jose,USA,2002:151-158
[9] Vaidya J.Clifton C1 Privacy preserving association rule mining in vertically partitioned data1[C]∥the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.2002:639-644
[10] Wu Ying-jie,Tang Qing-ming,Ni Wei-wei,et al.Private Preserving Data Publishing Based on Clustering[J].Computer Research and Development,2013,50(3):578-593(in Chinese) 吴英杰,唐庆明,倪巍伟,等.基于聚类杂交的隐私保护轨迹数据发布算法[J].计算机研究与发展,2013,0(3):578-593
[11] Li Yang,Hao Zhi-feng.Private Preserving K-means Clustering Methods Research[J].Computer Science,2013,3(1):39-45(in Chinese) 李杨,郝志峰.差分隐私保护k-means聚类方法研究[J].计算机科学,2013,3(1):39-45
[12] Fang Wei-wei,Yang Bing-ru,Xia Hong-ke.Private Preserving Clustering Model Based on SMC[J].System Engineering and Electric Technology,2012,34(7):567-578(in Chinese) 方炜炜,杨炳儒,夏红科.基于SMC的隐私保护聚类模型[J].系统工程与电子技术,2012,34(7):567-578
[13] Xiong Ping,Zhu Tian-qing.One Private Preserving Algorithm Based on Decision Tree[J].Computer Application Research,2014,1(10):354-360(in Chinese) 熊平,朱天清.一种面向决策树构建的差分隐私保护算法[J].计算机应用研究,2014,31(10):354-360
[14] Zhang Cheng-xue.Private Preserving Algorithm Based on Data Victoria Distributian[J].Shandong Technology University Paper,2011,30(2):30-38(in Chinese) 张成学.数据垂直分布的线性规划的隐私保护算法[J].山东科技大学,2011,30(2):30-38

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .