计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 71-76.doi: 10.11896/jsjkx.190100223
王斌, 马俊杰, 房新秀, 魏天佑
WANG Bin, MA Jun-jie, FANG Xin-xiu, WEI Tian-you
摘要: 基于时间戳的关联规则挖掘算法(SLMCM)主要用于解决新增项的问题,但效率较低,难以适应大数据挖掘。针对这个问题,文中提出了改进算法E-SLMCM和DE-SLMCM。E-SLMCM算法采用垂直结构,仅需遍历数据库两次,在将数据库转化为垂直格式时,可直接记录各项的时间戳,且不需要将每条事务的各项按时间戳进行排序;另外,提出了新的求项集时间戳的方法,在求更高项集的时间戳时不用多次遍历数据库。E-SLMCM算法适合应用于稀疏数据库,为了提高在密集数据库上的运行效率,在E-SLMCM算法的基础上采用差集思想提出了DE-SLMCM算法。所列举的4个基于公共数据集的仿真实验中,在不同最小支持度条件下,E-SLMCM和DE-SLMCM分别在稀疏和密集数据集上运行的时间效率是SLMCM的10~1 000倍。
中图分类号:
[1]AGRAWAL R,IMIELIHSKI T,SWAMI A.Mining association rules between sets of items in large databases[J].Acm sigmod record,1993,22(2):207-216. [2]HAN J,PEI J,YIN Y .Mining Frequent Patterns Without Candidate Generation[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data.Dallas,Te-xas,USA:ACM,2000. [3]ZAKI M J.Scalable algorithms for association mining[J].IEEE transactions on knowledge and data engineering,2000,12(3):372-390. [4]RASHID M M,GONDAL I,KAMRUZZAMAN J.Mining associated patterns from wireless sensor networks[J].IEEE Tran-sactions on Computers,2015,64(7):1998-2011. [5]WANG L,MENG J,XU P,et al.Mining temporal association rules with frequent itemsets tree[J].Applied Soft Computing,2018,62:817-829. [6]CHEN C H,LAN G C,HONG T P,et al.Mining fuzzy temporal association rules by item lifespans[J].Applied Soft Computing,2016,41:265-274. [7]ZOU C,DENG H,WAN J,et al.Mining and updating association rules based on fuzzy concept lattice[J].Future Generation Computer Systems,2018,82:698-706. [8]MIHOLCA D L,CZIBULA G,CRIVEI L M.A new incremental relational association rules mining approach[J].Procedia Computer Science,2018,126:126-135. [9]NGUYEN D,LUO W,PHUNG D,et al.LTARM:A novel temporal association rule mining method to understand toxicities in a routine cancer treatment[J].Knowledge-Based Systems,2018,161:313-328. [10]ZHANG S C,ZHANG J L,CHEN F,et al.Negative Incremental Updating Algorithm for Maintaining Association rules[J].Computer Science,2005,32(9):153-155.(in Chinese) 张师超,张继连,陈峰,等.负增量式关联规则更新算法[J].计算机科学,2005,32(9):153-155. [11]CAI J,XUE Y S,LIN L,et al.Updating Algorithm for Association Rules Based on Fully M ining Incremental Transactions[J].Computer Science,2007,34(2):220-222.(in Chinese) 蔡进,薛永生,林丽,等.基于充分挖掘增量事务的关联规则更新算法[J].计算机科学,2007,34(2):220-222. [12]GAN W,LIN C W,FOURNIER-VIGER P,et al.Mining of frequent patterns with multiple minimum supports[J].Engineering Applications of Artificial Intelligence,2017,60(C):83-96. [13]DARRAB S,ERGENC B.Vertical Pattern Mining Algorithm for Multiple Support Thresholds[J].Procedia ComputerScien-ce,2017,112:417-426. [14]HU H W,CHANG H C,LIN W S,et al.An optimized frequent pattern mining algorithm with multiple minimum supports[C]//IEEE International Conference on Big Data.IEEE,2017. [15]DAHBI A,BALOUKI Y,GADI T.Using Multiple Minimum Support to Auto-adjust the Threshold of Support in Apriori Algorithm[C]//International Conference on Soft Computing and Pattern Recognition.Cham:Springer,2017:111-119. [16]WENG C H.Identifying association rules of specific later-marketed products[J].Applied Soft Computing,2016,38:518-529. [17]GAO J,SHI B L.Research on Fast Association Rule Mining Algorithm[J].Computer Science,2005,32(3):200-201.(in Chinese) c高俊,施伯乐.快速关联规则挖掘算法研究[J].计算机科学,2005,32(3):200-201. [18]LIU D,LIU Y,YIN Z.Fast algorithm for discovering maximum frequent itemsets of association rules[J].Acta Scientiarium Natu-ralium Universitatis Jilinensis,2004,42(2):212-215. [19]ZAKI M J,GOUDA K.Fast vertical mining using diffsets[C]//Proceedings of the Ninth ACM SIGKDD International Confe-rence on Knowledge Discovery and Data Mining.ACM,2003:326-335. [20]FOURNIER-VIGER P,GOMARIZ A,SOLTANI A,et al. SPMF:OpenSource Data Mining Platform[EB/OL].http://www.philippe-fournier-viger.com/spmf. |
[1] | 曹扬晨, 朱国胜, 孙文和, 吴善超. 未知网络攻击识别关键技术研究 Study on Key Technologies of Unknown Network Attack Identification 计算机科学, 2022, 49(6A): 581-587. https://doi.org/10.11896/jsjkx.210400044 |
[2] | 徐慧慧, 晏华. 基于相对危险度的儿童先心病风险因素分析算法 Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children 计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082 |
[3] | 沈夏炯, 杨继勇, 张磊. 基于不相关属性集合的属性探索算法 Attribute Exploration Algorithm Based on Unrelated Attribute Set 计算机科学, 2021, 48(4): 54-62. https://doi.org/10.11896/jsjkx.200800082 |
[4] | 李艾玲, 张凤荔, 高强, 王瑞锦. 基于自适应时间戳与多尺度特征提取的轨迹下一足迹预测模型 Trajectory Next Footprint Prediction Model Based on Adaptive Timestamp and Multi-scale Feature Extraction 计算机科学, 2021, 48(11A): 191-197. https://doi.org/10.11896/jsjkx.201200015 |
[5] | 崔巍, 贾晓琳, 樊帅帅, 朱晓燕. 一种新的不均衡关联分类算法 New Associative Classification Algorithm for Imbalanced Data 计算机科学, 2020, 47(6A): 488-493. https://doi.org/10.11896/JsJkx.190600132 |
[6] | 张素梅, 张波涛. 一种基于量子耗散粒子群的评估模型构建方法 Evaluation Model Construction Method Based on Quantum Dissipative Particle Swarm Optimization 计算机科学, 2020, 47(6A): 84-88. https://doi.org/10.11896/JsJkx.190900148 |
[7] | 陈孟辉, 曹黔峰, 兰彦琦. 基于区块挖掘与重组的启发式算法求解置换流水车间调度问题 Heuristic Algorithm Based on Block Mining and Recombination for Permutation Flow-shop Scheduling Problem 计算机科学, 2020, 47(6A): 108-113. https://doi.org/10.11896/JsJkx.190300151 |
[8] | 王青松, 姜富山, 李菲. 大数据环境下基于关联规则的多标签学习算法 Multi-label Learning Algorithm Based on Association Rules in Big Data Environment 计算机科学, 2020, 47(5): 90-95. https://doi.org/10.11896/jsjkx.190300150 |
[9] | 朱岸青, 李帅, 唐晓东. Spark平台中的并行化FP_growth关联规则挖掘方法 Parallel FP_growth Association Rules Mining Method on Spark Platform 计算机科学, 2020, 47(12): 139-143. https://doi.org/10.11896/jsjkx.191000110 |
[10] | 张蕾,蔡明. 基于主题融合和关联规则挖掘的图像标注 Image Annotation Based on Topic Fusion and Frequent Patterns Mining 计算机科学, 2019, 46(7): 246-251. https://doi.org/10.11896/j.issn.1002-137X.2019.07.037 |
[11] | 张维国. 面向知识推荐服务的选课决策 Decision Making of Course Selection Oriented by Knowledge Recommendation Service 计算机科学, 2019, 46(6A): 507-510. |
[12] | 陆鑫赟, 王兴芬. 基于领域关联冗余的教务数据关联规则挖掘 Educational Administration Data Mining of Association Rules Based on Domain Association Redundancy 计算机科学, 2019, 46(6A): 427-430. |
[13] | 李智星, 任诗雅, 王化明, 沈柯. 基于非结构化文本增强关联规则的知识推理方法 Knowledge Reasoning Method Based on Unstructured Text-enhanced Association Rules 计算机科学, 2019, 46(11): 209-215. https://doi.org/10.11896/jsjkx.181001939 |
[14] | 贾熹滨,叶颖婕,陈军成. 基于关联规则的交通事故影响因素的挖掘 Influence Factors Mining of Traffic Accidents Based on Association Rules 计算机科学, 2018, 45(6A): 447-452. |
[15] | 丁勇, 朱长水, 武玉艳. 一种基于Hadoop的关联规则挖掘算法 Association Rule Mining Algorithm Based on Hadoop 计算机科学, 2018, 45(11A): 409-411. |
|