计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 71-76.doi: 10.11896/jsjkx.190100223

• 大数据与数据科学* • 上一篇    下一篇

基于时间戳和垂直格式的关联规则挖掘算法

王斌, 马俊杰, 房新秀, 魏天佑   

  1. (青岛理工大学信息与控制工程学院 山东 青岛266520)
  • 收稿日期:2019-01-26 修回日期:2019-05-05 出版日期:2019-10-15 发布日期:2019-10-21
  • 通讯作者: 王斌(1963-),男,博士,教授,主要研究方向为知识发现、博弈论及应用等,E-mail:wb769@sina.com。
  • 作者简介:马俊杰(1992-),男,硕士,主要研究方向为数据挖掘等;房新秀(1994-),女,硕士,主要研究方向为数据挖掘等;魏天佑(1994-),男,硕士,主要研究方向为数据挖掘等。
  • 基金资助:
    本文受国家自然科学基金项目(61502262)资助。

Association Rule Mining Algorithm Based on Timestamp and Vertical Format

WANG Bin, MA Jun-jie, FANG Xin-xiu, WEI Tian-you   

  1. (School of Information and Control Engineering,Qingdao University of Technology,Qingdao,Shandong 266520,China)
  • Received:2019-01-26 Revised:2019-05-05 Online:2019-10-15 Published:2019-10-21

摘要: 基于时间戳的关联规则挖掘算法(SLMCM)主要用于解决新增项的问题,但效率较低,难以适应大数据挖掘。针对这个问题,文中提出了改进算法E-SLMCM和DE-SLMCM。E-SLMCM算法采用垂直结构,仅需遍历数据库两次,在将数据库转化为垂直格式时,可直接记录各项的时间戳,且不需要将每条事务的各项按时间戳进行排序;另外,提出了新的求项集时间戳的方法,在求更高项集的时间戳时不用多次遍历数据库。E-SLMCM算法适合应用于稀疏数据库,为了提高在密集数据库上的运行效率,在E-SLMCM算法的基础上采用差集思想提出了DE-SLMCM算法。所列举的4个基于公共数据集的仿真实验中,在不同最小支持度条件下,E-SLMCM和DE-SLMCM分别在稀疏和密集数据集上运行的时间效率是SLMCM的10~1 000倍。

关键词: 差集, 关联规则, 时间戳, 新增项

Abstract: The SLMCM algorithm (Specific Later-marketed Consequent Mining) is mainly used to solve the problem of later item,but it is inefficient and difficult to adapt to big data mining.For this problem,this paper proposed the improved algorithms E-SLMCM and DE-SLMCM .E-SLMCM algorithm is based on vertical structure,so it only traversal the database twice.Furthermore,the timestamp of each item can be directly calculated when the format is converted to vertical,and each transaction does not need to be sorted by the timestamp of the item.In addition,a new method for finding the itemset timestamp was proposed,which dose not need to traverse the database to find the timestamp of itemset.In order to adapt to dense database,DE-SLMCM algorithm was proposed based on E-SLMCM algorithm and diffset,which improves the execution efficiency on dense database.In the listed four simulation experiments based on common data sets,the time efficiency of E-SLMCM and DE-SLMCM running on sparse and dense data sets is 10-1000 times higher than that of SLMCM.

Key words: Association rules, Diffset, Later item, Timestamp

中图分类号: 

  • TP391
[1]AGRAWAL R,IMIELIHSKI T,SWAMI A.Mining association rules between sets of items in large databases[J].Acm sigmod record,1993,22(2):207-216.
[2]HAN J,PEI J,YIN Y .Mining Frequent Patterns Without Candidate Generation[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.Dallas,Te-xas,USA:ACM,2000.
[3]ZAKI M J.Scalable algorithms for association mining[J].IEEE transactions on knowledge and data engineering,2000,12(3):372-390.
[4]RASHID M M,GONDAL I,KAMRUZZAMAN J.Mining associated patterns from wireless sensor networks[J].IEEE Tran-sactions on Computers,2015,64(7):1998-2011.
[5]WANG L,MENG J,XU P,et al.Mining temporal association rules with frequent itemsets tree[J].Applied Soft Computing,2018,62:817-829.
[6]CHEN C H,LAN G C,HONG T P,et al.Mining fuzzy temporal association rules by item lifespans[J].Applied Soft Computing,2016,41:265-274.
[7]ZOU C,DENG H,WAN J,et al.Mining and updating association rules based on fuzzy concept lattice[J].Future Generation Computer Systems,2018,82:698-706.
[8]MIHOLCA D L,CZIBULA G,CRIVEI L M.A new incremental relational association rules mining approach[J].Procedia Computer Science,2018,126:126-135.
[9]NGUYEN D,LUO W,PHUNG D,et al.LTARM:A novel temporal association rule mining method to understand toxicities in a routine cancer treatment[J].Knowledge-Based Systems,2018,161:313-328.
[10]ZHANG S C,ZHANG J L,CHEN F,et al.Negative Incremental Updating Algorithm for Maintaining Association rules[J].Computer Science,2005,32(9):153-155.(in Chinese)
张师超,张继连,陈峰,等.负增量式关联规则更新算法[J].计算机科学,2005,32(9):153-155.
[11]CAI J,XUE Y S,LIN L,et al.Updating Algorithm for Association Rules Based on Fully M ining Incremental Transactions[J].Computer Science,2007,34(2):220-222.(in Chinese)
蔡进,薛永生,林丽,等.基于充分挖掘增量事务的关联规则更新算法[J].计算机科学,2007,34(2):220-222.
[12]GAN W,LIN C W,FOURNIER-VIGER P,et al.Mining of frequent patterns with multiple minimum supports[J].Engineering Applications of Artificial Intelligence,2017,60(C):83-96.
[13]DARRAB S,ERGENC B.Vertical Pattern Mining Algorithm for Multiple Support Thresholds[J].Procedia ComputerScien-ce,2017,112:417-426.
[14]HU H W,CHANG H C,LIN W S,et al.An optimized frequent pattern mining algorithm with multiple minimum supports[C]//IEEE International Conference on Big Data.IEEE,2017.
[15]DAHBI A,BALOUKI Y,GADI T.Using Multiple Minimum Support to Auto-adjust the Threshold of Support in Apriori Algorithm[C]//International Conference on Soft Computing and Pattern Recognition.Cham:Springer,2017:111-119.
[16]WENG C H.Identifying association rules of specific later-marketed products[J].Applied Soft Computing,2016,38:518-529.
[17]GAO J,SHI B L.Research on Fast Association Rule Mining Algorithm[J].Computer Science,2005,32(3):200-201.(in Chinese)
c高俊,施伯乐.快速关联规则挖掘算法研究[J].计算机科学,2005,32(3):200-201.
[18]LIU D,LIU Y,YIN Z.Fast algorithm for discovering maximum frequent itemsets of association rules[J].Acta Scientiarium Natu-ralium Universitatis Jilinensis,2004,42(2):212-215.
[19]ZAKI M J,GOUDA K.Fast vertical mining using diffsets[C]//Proceedings of the Ninth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.ACM,2003:326-335.
[20]FOURNIER-VIGER P,GOMARIZ A,SOLTANI A,et al. SPMF:OpenSource Data Mining Platform[EB/OL].http://www.philippe-fournier-viger.com/spmf.
[1] 曹扬晨, 朱国胜, 孙文和, 吴善超.
未知网络攻击识别关键技术研究
Study on Key Technologies of Unknown Network Attack Identification
计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044
[2] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[3] 沈夏炯, 杨继勇, 张磊.
基于不相关属性集合的属性探索算法
Attribute Exploration Algorithm Based on Unrelated Attribute Set
计算机科学, 2021, 48(4): 54-62. https://doi.org/10.11896/jsjkx.200800082
[4] 李艾玲, 张凤荔, 高强, 王瑞锦.
基于自适应时间戳与多尺度特征提取的轨迹下一足迹预测模型
Trajectory Next Footprint Prediction Model Based on Adaptive Timestamp and Multi-scale Feature Extraction
计算机科学, 2021, 48(11A): 191-197. https://doi.org/10.11896/jsjkx.201200015
[5] 崔巍, 贾晓琳, 樊帅帅, 朱晓燕.
一种新的不均衡关联分类算法
New Associative Classification Algorithm for Imbalanced Data
计算机科学, 2020, 47(6A): 488-493. https://doi.org/10.11896/JsJkx.190600132
[6] 张素梅, 张波涛.
一种基于量子耗散粒子群的评估模型构建方法
Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization
计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148
[7] 陈孟辉, 曹黔峰, 兰彦琦.
基于区块挖掘与重组的启发式算法求解置换流水车间调度问题
Heuristic Algorithm Based on Block Mining and Recombination for Permutation Flow-shop Scheduling Problem
计算机科学, 2020, 47(6A): 108-113. https://doi.org/10.11896/JsJkx.190300151
[8] 王青松, 姜富山, 李菲.
大数据环境下基于关联规则的多标签学习算法
Multi-label Learning Algorithm Based on Association Rules in Big Data Environment
计算机科学, 2020, 47(5): 90-95. https://doi.org/10.11896/jsjkx.190300150
[9] 朱岸青, 李帅, 唐晓东.
Spark平台中的并行化FP_growth关联规则挖掘方法
Parallel FP_growth Association Rules Mining Method on Spark Platform
计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110
[10] 张蕾,蔡明.
基于主题融合和关联规则挖掘的图像标注
Image Annotation Based on Topic Fusion and Frequent Patterns Mining
计算机科学, 2019, 46(7): 246-251. https://doi.org/10.11896/j.issn.1002-137X.2019.07.037
[11] 张维国.
面向知识推荐服务的选课决策
Decision Making of Course Selection Oriented by Knowledge Recommendation Service
计算机科学, 2019, 46(6A): 507-510.
[12] 陆鑫赟, 王兴芬.
基于领域关联冗余的教务数据关联规则挖掘
Educational Administration Data Mining of Association Rules Based on Domain Association Redundancy
计算机科学, 2019, 46(6A): 427-430.
[13] 李智星, 任诗雅, 王化明, 沈柯.
基于非结构化文本增强关联规则的知识推理方法
Knowledge Reasoning Method Based on Unstructured Text-enhanced Association Rules
计算机科学, 2019, 46(11): 209-215. https://doi.org/10.11896/jsjkx.181001939
[14] 贾熹滨,叶颖婕,陈军成.
基于关联规则的交通事故影响因素的挖掘
Influence Factors Mining of Traffic Accidents Based on Association Rules
计算机科学, 2018, 45(6A): 447-452.
[15] 丁勇, 朱长水, 武玉艳.
一种基于Hadoop的关联规则挖掘算法
Association Rule Mining Algorithm Based on Hadoop
计算机科学, 2018, 45(11A): 409-411.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!