计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 147-154.doi: 10.11896/jsjkx.211200248
王珠林1, 武优西1,2, 王月华3, 刘靖宇1,2
WANG Zhulin1, WU Youxi1,2, WANG Yuehua3, LIU Jingyu1,2
摘要: 间隙约束的序列模式挖掘是一种特殊形式的序列模式挖掘方法,该方法能够揭示一定间隔下的频繁出现(发生)的子序列。但当前间隙约束的序列模式挖掘方法只关注正序列模式的挖掘,忽略了事件中的缺失行为。为解决该问题,探索了周期间隙约束的负序列模式(Negative Sequential Pattern with Periodic Gap Constraints,NSPG)挖掘方法,该方法能够更灵活地反映元素与元素之间的关系。为高效求解NSPG挖掘问题,提出了NSPG-INtree(Incomplete Nettrees)算法,该算法主要包括两个步骤:候选模式生成和支持度计算。在候选模式生成方面,为了减少候选模式的数量,该算法采用模式连接策略;在支持度计算方面,为了提高模式支持度计算效率并减少空间消耗,该算法采用不完整网树结构计算模式支持度。实验结果表明,NSPG-INtree算法不仅具有较高的挖掘效率,而且能同时挖掘间隙约束的正序列模式和负序列模式。与其他间隙约束的序列模式挖掘算法相比,NSPG-INtree能够多发现209%~352%的模式;与不同策略的对比算法相比,NSPG-INtree能够缩短6%~38%的运行时间。
中图分类号:
| [1]ZHANG G L,YANG Q H,CHENG X M,et al.Application of sequence pattern mining in communication network alarm prediction[J].Computer Science,2018,45(11A):535-538. [2]LIU X B,WANG L Z,ZHOU L H.MLCPM-UC:A Multi-level co-location pattern mining algorithm based on uniform coefficient of pattern instance distribution [J].Computer Science,2021,48(11):208-218. [3]WU Y X,ZHU C R,LI Y,et al.NetNCSP:Nonoverlappingclosed sequential pattern mining [J].Knowledge-Based Systems,2020,196:105812. [4]WU Y X,LUO L F,LI Y,et al.NTP-Miner:Nonoverlapping three-way sequential pattern mining [J].ACM Transactions on Knowledge Discovery from Data,2022,16(3):1-21. [5]WU C F,CAI L,LI J,et al.Frequent pattern mining of resident'stravel based on multi-source location data [J].Computer Science,2021,48(7):155-163. [6]HUANGT C.Mining the change of customer behavior in fuzzy time-interval sequential patterns [J].Applied Soft Computing,2012,12(3):1068-1086. [7]WU Y X,WANG L L,REN J D,et al.Mining sequential patterns with periodic wildcard gaps [J].Applied Intelligence,2014,41(1):99-116. [8]WU Y X,LIU X,YAN W J,et al.Efficient algorithm for solving non-overlapping strict sequential pattern matching [J].Journal of Software,2021,32(11):3331-3350. [9]GUYET T,QUINIOU R.NegPSpan:Efficient extraction ofnegative sequential patterns with embedding constraints [J].Data Mining and Knowledge Discovery,2020,34(3):563-609. [10]ZHENG Z G,ZHAO Y C,ZUO Z Y,et al.Negative-GSP:An efficient method for mining negative sequential patterns[C]//Eighth Australasian Data Mining Conference.2009:63-67. [11]DING B,LO D,HAN J W,et al.Efficient mining of closed repetitive gapped subsequences from a sequence database[C]//IEEE International Conference on Data Engineering.2009:1024-1035. [12]WU Y X,TONG Y,ZHU X Q,et al.NOSEP:Nonoverlapping sequence pattern mining with gap constraints [J].IEEE Transactions on Cybernetics,2018,48(10):2809-2822. [13]SHI Q S,SHAN J S,YAN W J,et al.NetNPG:Nonoverlapping pattern matching with general gap constraints [J].Applied Intelligence,2020,50(1):1832-1845. [14]WANG L,WANG S,LIU S L,et al.An algorithm of mining sequential pattern with wildcards based on Index-Tree [J].Chinese Journal of Computers,2019,42(3):555-565. [15]ZHOU Z Y,PI D C.Rare sequence pattern mining algorithm for shopping basket data [J].Journal of Chinese Computer Systems,2019,40(3):683-688. [16]WU Y X,GENG M,LI Y,et al.HANP-Miner:High averageutility nonoverlapping sequential pattern mining[J].Knowledge-Based Systems,2021,229,107361. [17]WU Y X,WANG Y H,LI Y,et al.Top-k self-adaptive contrast sequential pattern mining [J].IEEE Transactions on Cyberne-tics,2022,52(11):11819-11833.. [18]WU Y X,LEI R,LI Y,et al.HAOP-Miner:Self-adaptive high-average utility one-off sequential pattern mining [J].Expert Systems with Applications,2021,184,115449. [19]WU Y X,ZHOU K,LIU J Y,et al.Mining sequential patterns with periodic general gap constraints [J].Chinese Journal of Computers,2017,40(6):1338-1352. [20]HUANG G,GAN W,HUANG S,et al.Negative pattern discovery with individual support [J].Knowledge-Based Systems,2022,251,109194. [21]LIN N P,CHEN H J,HAO W H.Mining negative sequential patterns [C]//Conference on Applied Computer Science.2007:654-658. [22]OUYANG W M,HUANG Q H,LUO S H.Mining positive and negative fuzzy sequential patterns in large transaction databases[C]//Fifth International Conference on Fuzzy Systems and Knowledge Discovery.2008:18-23. [23]HSUEH S C,LIN M Y,CHEN C L.Mining negative sequential patterns for e-commerce recommendations[C]//IEEE Asia-Pacific Services Computing Conference.2008:1213-1218. [24]CAO L B,D X J,ZHENG Z G.E-NSP:Efficient negative se-quential pattern mining [J].Artificial Intelligence,2016,235(1):156-182. [25]DONG X J,GONG Y S,CAO L B.F-NSP +:A fast negative sequential patterns mining method with self-adaptive data sto-rage[J].Pattern Recognition,2018,84:13-27. [26]DONG X J,GONG Y S,CAO L B.e-RNSP:An efficient method for mining repetition negative sequential patterns[J].IEEE Transactions on Cybernetics,2018,50(5):2084-2096. [27]WU Y X,WU X D,MIN F,et al.A Nettree for Pattern Ma-tching with Flexible Wildcard Constraints[C]//IEEE Interna-tional Conference on Information Reuse & Integration.2010:109-114. | 
| [1] | 吴成凤, 蔡莉, 李劲, 梁宇. 基于多源位置数据的居民出行频繁模式挖掘 Frequent Pattern Mining of Residents’ Travel Based on Multi-source Location Data 计算机科学, 2021, 48(7): 155-163. https://doi.org/10.11896/jsjkx.200800072 | 
| [2] | 陆鑫赟, 王兴芬. 基于领域关联冗余的教务数据关联规则挖掘 Educational Administration Data Mining of Association Rules Based on Domain Association Redundancy 计算机科学, 2019, 46(6A): 427-430. | 
| [3] | 曾庆田, 刘晨征, 倪维健, 段华. 面向序数回归的组合特征提取方法 Combined Feature Extraction Method for Ordinal Regression 计算机科学, 2019, 46(6): 69-74. https://doi.org/10.11896/j.issn.1002-137X.2019.06.009 | 
| [4] | 张洪泽, 洪征, 王辰, 冯文博, 吴礼发. 基于闭合序列模式挖掘的未知协议格式推断方法 Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method 计算机科学, 2019, 46(6): 80-89. https://doi.org/10.11896/j.issn.1002-137X.2019.06.011 | 
| [5] | 孙文平, 常亮, 宾辰忠, 古天龙, 孙彦鹏. 基于知识图谱和频繁序列挖掘的旅游路线推荐 Travel Route Recommendation Based on Knowledge Graph and Frequent Sequence Mining 计算机科学, 2019, 46(2): 56-61. https://doi.org/10.11896/j.issn.1002-137X.2019.02.009 | 
| [6] | 荣俸萍,方勇,左政,刘亮. MACSPMD:基于恶意API调用序列模式挖掘的恶意代码检测 MACSPMD:Malicious API Call Sequential Pattern Mining Based Malware Detection 计算机科学, 2018, 45(5): 131-138. https://doi.org/10.11896/j.issn.1002-137X.2018.05.022 | 
| [7] | 张光兰, 杨秋辉, 程雪梅, 姜科, 王帅, 谭武坤. 序列模式挖掘在通信网络告警预测中的应用 Application of Sequence Pattern Mining in Communication Network Alarm Prediction 计算机科学, 2018, 45(11A): 535-538. | 
| [8] | 王树怡,董东. 基于聚类和偏序序列的API用法模式挖掘 Mining of API Usage Pattern Based on Clustering and Partial Order Sequences 计算机科学, 2017, 44(Z6): 486-490. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.108 | 
| [9] | 江晶晶,王志海,原继东. 一种基于数据流模式表示的半懒惰式分类算法 Partially-lazy Learning Classification Algorithm Based on Representation of Data Stream Model 计算机科学, 2017, 44(7): 167-174. https://doi.org/10.11896/j.issn.1002-137X.2017.07.030 | 
| [10] | 崔展齐,牟永敏,张志华,王伟光. 基于函数调用序列模式挖掘的程序缺陷检测 Defects Detection Based on Mining Function Call Sequence Patterns 计算机科学, 2017, 44(11): 226-231. https://doi.org/10.11896/j.issn.1002-137X.2017.11.034 | 
| [11] | 曾新,杨健. 带时间约束的co-location模式挖掘 Co-location Patterns Mining with Time Constraint 计算机科学, 2016, 43(2): 293-296. https://doi.org/10.11896/j.issn.1002-137X.2016.02.061 | 
| [12] | 姬浩博,王俊红. 一种改进的PrefixSpan算法及其在Web用户行为模式挖掘中的应用 Research on Improved PrefixSpan Algorithm and its Application in Web User Behavior Patterns Mining 计算机科学, 2016, 43(1): 25-29. https://doi.org/10.11896/j.issn.1002-137X.2016.01.006 | 
| [13] | 崔亮,郭静,吴玲达. 一种基于动态散列和事务压缩的关联规则挖掘算法 Algorithm for Mining Association Roles Based on Dynamic Hashing and Transaction Reduction 计算机科学, 2015, 42(9): 41-44. https://doi.org/10.11896/j.issn.1002-137X.2015.09.009 | 
| [14] | 程舒通,徐从富,但红卫. 高效隐私保护频繁模式挖掘算法研究 Research on Efficient Privacy Preserving Frequent Pattern Mining Algorithm 计算机科学, 2015, 42(4): 194-198. https://doi.org/10.11896/j.issn.1002-137X.2015.04.039 | 
| [15] | 唐家维,王晓峰. 基于GPU的并行化Apriori算法的设计与实现 Design and Implementation of Apriori on GPU 计算机科学, 2014, 41(10): 238-243. https://doi.org/10.11896/j.issn.1002-137X.2014.10.050 | 
| 
 | ||