计算机科学 ›› 2023, Vol. 50 ›› Issue (3): 147-154.doi: 10.11896/jsjkx.211200248

• 数据库&大数据&数据科学 • 上一篇    下一篇

具有周期间隙约束的负序列模式挖掘

王珠林1, 武优西1,2, 王月华3, 刘靖宇1,2   

  1. 1 河北工业大学人工智能与数据科学学院 天津 300401
    2 河北省大数据计算重点实验室 天津 300401
    3 河北工业大学经济管理学院 天津 300401
  • 收稿日期:2021-12-23 修回日期:2022-06-21 出版日期:2023-03-15 发布日期:2023-03-15
  • 通讯作者: 武优西(wuc567@163.com)
  • 作者简介:(1299056565@qq.com)
  • 基金资助:
    国家自然科学基金(61976240)

Mining Negative Sequential Patterns with Periodic Gap Constraints

WANG Zhulin1, WU Youxi1,2, WANG Yuehua3, LIU Jingyu1,2   

  1. 1 School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China
    2 Hebei Key Laboratory of Big Data Computing,Tianjin 300401,China
    3 School of Economics and Management,Hebei University of Technology,Tianjin 300401,China
  • Received:2021-12-23 Revised:2022-06-21 Online:2023-03-15 Published:2023-03-15
  • About author:WANG Zhulin,born in 1997,postgra-duate,is a member of China Computer Federation.His main research interests include data mining and so on.
    WU Youxi,born in 1974,Ph.D,professor,Ph.D supervisor,is a distinguished member of China Computer Federation.His main research interests include data mining and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61976240).

摘要: 间隙约束的序列模式挖掘是一种特殊形式的序列模式挖掘方法,该方法能够揭示一定间隔下的频繁出现(发生)的子序列。但当前间隙约束的序列模式挖掘方法只关注正序列模式的挖掘,忽略了事件中的缺失行为。为解决该问题,探索了周期间隙约束的负序列模式(Negative Sequential Pattern with Periodic Gap Constraints,NSPG)挖掘方法,该方法能够更灵活地反映元素与元素之间的关系。为高效求解NSPG挖掘问题,提出了NSPG-INtree(Incomplete Nettrees)算法,该算法主要包括两个步骤:候选模式生成和支持度计算。在候选模式生成方面,为了减少候选模式的数量,该算法采用模式连接策略;在支持度计算方面,为了提高模式支持度计算效率并减少空间消耗,该算法采用不完整网树结构计算模式支持度。实验结果表明,NSPG-INtree算法不仅具有较高的挖掘效率,而且能同时挖掘间隙约束的正序列模式和负序列模式。与其他间隙约束的序列模式挖掘算法相比,NSPG-INtree能够多发现209%~352%的模式;与不同策略的对比算法相比,NSPG-INtree能够缩短6%~38%的运行时间。

关键词: 序列模式挖掘, 负序列模式, 频繁模式, 间隙约束, 不完整网树

Abstract: Sequential pattern mining with gap constraints is a special form of sequential pattern mining,which can reveal frequent subsequences in a certain gap.However,the current sequential pattern mining methods with gap constraints only focus on positive sequential pattern mining,and ignore the missing behavior in a series of events.To solve this problem,a negative sequential pattern method with periodic gap constraints(NSPG) mining is explored,which can reflect the relationship between elements more flexibly.To solve the problem of NSPG mining,this paper proposes an NSPG-INtree(incomplete nettrees) algorithm,which includes two key steps:candidate pattern generation and support calculation.For candidate pattern generation,to reduce the number of candidate patterns,the algorithm uses a pattern join strategy.For support calculation,to improve the efficiency and reduce space consumption,the algorithm employs an incomplete nettree structure to calculate the supports of patterns.Experimental results show that NSPG-INtree not only has high mining efficiency,but also can mine positive and negative sequential patterns with gap constraints.NSPG-INtree can find 209% ~ 352% more patterns than other gap-constrained sequential pattern mining algorithms.Moreover,NSPG-INtree can reduce the running time by 6%~38% than other competitive algorithms with different stra-tegies.

Key words: Sequential pattern mining, Negative sequential pattern, Frequent pattern, Gap constraint, Incomplete nettree

中图分类号: 

  • TP311
[1]ZHANG G L,YANG Q H,CHENG X M,et al.Application of sequence pattern mining in communication network alarm prediction[J].Computer Science,2018,45(11A):535-538.
[2]LIU X B,WANG L Z,ZHOU L H.MLCPM-UC:A Multi-level co-location pattern mining algorithm based on uniform coefficient of pattern instance distribution [J].Computer Science,2021,48(11):208-218.
[3]WU Y X,ZHU C R,LI Y,et al.NetNCSP:Nonoverlappingclosed sequential pattern mining [J].Knowledge-Based Systems,2020,196:105812.
[4]WU Y X,LUO L F,LI Y,et al.NTP-Miner:Nonoverlapping three-way sequential pattern mining [J].ACM Transactions on Knowledge Discovery from Data,2022,16(3):1-21.
[5]WU C F,CAI L,LI J,et al.Frequent pattern mining of resident'stravel based on multi-source location data [J].Computer Science,2021,48(7):155-163.
[6]HUANGT C.Mining the change of customer behavior in fuzzy time-interval sequential patterns [J].Applied Soft Computing,2012,12(3):1068-1086.
[7]WU Y X,WANG L L,REN J D,et al.Mining sequential patterns with periodic wildcard gaps [J].Applied Intelligence,2014,41(1):99-116.
[8]WU Y X,LIU X,YAN W J,et al.Efficient algorithm for solving non-overlapping strict sequential pattern matching [J].Journal of Software,2021,32(11):3331-3350.
[9]GUYET T,QUINIOU R.NegPSpan:Efficient extraction ofnegative sequential patterns with embedding constraints [J].Data Mining and Knowledge Discovery,2020,34(3):563-609.
[10]ZHENG Z G,ZHAO Y C,ZUO Z Y,et al.Negative-GSP:An efficient method for mining negative sequential patterns[C]//Eighth Australasian Data Mining Conference.2009:63-67.
[11]DING B,LO D,HAN J W,et al.Efficient mining of closed repetitive gapped subsequences from a sequence database[C]//IEEE International Conference on Data Engineering.2009:1024-1035.
[12]WU Y X,TONG Y,ZHU X Q,et al.NOSEP:Nonoverlapping sequence pattern mining with gap constraints [J].IEEE Transactions on Cybernetics,2018,48(10):2809-2822.
[13]SHI Q S,SHAN J S,YAN W J,et al.NetNPG:Nonoverlapping pattern matching with general gap constraints [J].Applied Intelligence,2020,50(1):1832-1845.
[14]WANG L,WANG S,LIU S L,et al.An algorithm of mining sequential pattern with wildcards based on Index-Tree [J].Chinese Journal of Computers,2019,42(3):555-565.
[15]ZHOU Z Y,PI D C.Rare sequence pattern mining algorithm for shopping basket data [J].Journal of Chinese Computer Systems,2019,40(3):683-688.
[16]WU Y X,GENG M,LI Y,et al.HANP-Miner:High averageutility nonoverlapping sequential pattern mining[J].Knowledge-Based Systems,2021,229,107361.
[17]WU Y X,WANG Y H,LI Y,et al.Top-k self-adaptive contrast sequential pattern mining [J].IEEE Transactions on Cyberne-tics,2022,52(11):11819-11833..
[18]WU Y X,LEI R,LI Y,et al.HAOP-Miner:Self-adaptive high-average utility one-off sequential pattern mining [J].Expert Systems with Applications,2021,184,115449.
[19]WU Y X,ZHOU K,LIU J Y,et al.Mining sequential patterns with periodic general gap constraints [J].Chinese Journal of Computers,2017,40(6):1338-1352.
[20]HUANG G,GAN W,HUANG S,et al.Negative pattern discovery with individual support [J].Knowledge-Based Systems,2022,251,109194.
[21]LIN N P,CHEN H J,HAO W H.Mining negative sequential patterns [C]//Conference on Applied Computer Science.2007:654-658.
[22]OUYANG W M,HUANG Q H,LUO S H.Mining positive and negative fuzzy sequential patterns in large transaction databases[C]//Fifth International Conference on Fuzzy Systems and Knowledge Discovery.2008:18-23.
[23]HSUEH S C,LIN M Y,CHEN C L.Mining negative sequential patterns for e-commerce recommendations[C]//IEEE Asia-Pacific Services Computing Conference.2008:1213-1218.
[24]CAO L B,D X J,ZHENG Z G.E-NSP:Efficient negative se-quential pattern mining [J].Artificial Intelligence,2016,235(1):156-182.
[25]DONG X J,GONG Y S,CAO L B.F-NSP +:A fast negative sequential patterns mining method with self-adaptive data sto-rage[J].Pattern Recognition,2018,84:13-27.
[26]DONG X J,GONG Y S,CAO L B.e-RNSP:An efficient method for mining repetition negative sequential patterns[J].IEEE Transactions on Cybernetics,2018,50(5):2084-2096.
[27]WU Y X,WU X D,MIN F,et al.A Nettree for Pattern Ma-tching with Flexible Wildcard Constraints[C]//IEEE Interna-tional Conference on Information Reuse & Integration.2010:109-114.
[1] 吴成凤, 蔡莉, 李劲, 梁宇.
基于多源位置数据的居民出行频繁模式挖掘
Frequent Pattern Mining of Residents’ Travel Based on Multi-source Location Data
计算机科学, 2021, 48(7): 155-163. https://doi.org/10.11896/jsjkx.200800072
[2] 陆鑫赟, 王兴芬.
基于领域关联冗余的教务数据关联规则挖掘
Educational Administration Data Mining of Association Rules Based on Domain Association Redundancy
计算机科学, 2019, 46(6A): 427-430.
[3] 曾庆田, 刘晨征, 倪维健, 段华.
面向序数回归的组合特征提取方法
Combined Feature Extraction Method for Ordinal Regression
计算机科学, 2019, 46(6): 69-74. https://doi.org/10.11896/j.issn.1002-137X.2019.06.009
[4] 张洪泽, 洪征, 王辰, 冯文博, 吴礼发.
基于闭合序列模式挖掘的未知协议格式推断方法
Closed Sequential Patterns Mining Based Unknown Protocol Format Inference Method
计算机科学, 2019, 46(6): 80-89. https://doi.org/10.11896/j.issn.1002-137X.2019.06.011
[5] 孙文平, 常亮, 宾辰忠, 古天龙, 孙彦鹏.
基于知识图谱和频繁序列挖掘的旅游路线推荐
Travel Route Recommendation Based on Knowledge Graph and Frequent Sequence Mining
计算机科学, 2019, 46(2): 56-61. https://doi.org/10.11896/j.issn.1002-137X.2019.02.009
[6] 荣俸萍,方勇,左政,刘亮.
MACSPMD:基于恶意API调用序列模式挖掘的恶意代码检测
MACSPMD:Malicious API Call Sequential Pattern Mining Based Malware Detection
计算机科学, 2018, 45(5): 131-138. https://doi.org/10.11896/j.issn.1002-137X.2018.05.022
[7] 张光兰, 杨秋辉, 程雪梅, 姜科, 王帅, 谭武坤.
序列模式挖掘在通信网络告警预测中的应用
Application of Sequence Pattern Mining in Communication Network Alarm Prediction
计算机科学, 2018, 45(11A): 535-538.
[8] 王树怡,董东.
基于聚类和偏序序列的API用法模式挖掘
Mining of API Usage Pattern Based on Clustering and Partial Order Sequences
计算机科学, 2017, 44(Z6): 486-490. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.108
[9] 江晶晶,王志海,原继东.
一种基于数据流模式表示的半懒惰式分类算法
Partially-lazy Learning Classification Algorithm Based on Representation of Data Stream Model
计算机科学, 2017, 44(7): 167-174. https://doi.org/10.11896/j.issn.1002-137X.2017.07.030
[10] 崔展齐,牟永敏,张志华,王伟光.
基于函数调用序列模式挖掘的程序缺陷检测
Defects Detection Based on Mining Function Call Sequence Patterns
计算机科学, 2017, 44(11): 226-231. https://doi.org/10.11896/j.issn.1002-137X.2017.11.034
[11] 曾新,杨健.
带时间约束的co-location模式挖掘
Co-location Patterns Mining with Time Constraint
计算机科学, 2016, 43(2): 293-296. https://doi.org/10.11896/j.issn.1002-137X.2016.02.061
[12] 姬浩博,王俊红.
一种改进的PrefixSpan算法及其在Web用户行为模式挖掘中的应用
Research on Improved PrefixSpan Algorithm and its Application in Web User Behavior Patterns Mining
计算机科学, 2016, 43(1): 25-29. https://doi.org/10.11896/j.issn.1002-137X.2016.01.006
[13] 崔亮,郭静,吴玲达.
一种基于动态散列和事务压缩的关联规则挖掘算法
Algorithm for Mining Association Roles Based on Dynamic Hashing and Transaction Reduction
计算机科学, 2015, 42(9): 41-44. https://doi.org/10.11896/j.issn.1002-137X.2015.09.009
[14] 程舒通,徐从富,但红卫.
高效隐私保护频繁模式挖掘算法研究
Research on Efficient Privacy Preserving Frequent Pattern Mining Algorithm
计算机科学, 2015, 42(4): 194-198. https://doi.org/10.11896/j.issn.1002-137X.2015.04.039
[15] 唐家维,王晓峰.
基于GPU的并行化Apriori算法的设计与实现
Design and Implementation of Apriori on GPU
计算机科学, 2014, 41(10): 238-243. https://doi.org/10.11896/j.issn.1002-137X.2014.10.050
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!