计算机科学 ›› 2019, Vol. 46 ›› Issue (12): 26-30.doi: 10.11896/jsjkx.181202289
柴欣, 高一寒, 武优西, 刘靖宇
CHAI Xin, GAO Yi-han, WU You-xi, LIU Jing-yu
摘要: 序列模式挖掘是从序列数据中发现用户感兴趣的模式。对比模式挖掘是其中的一类挖掘方法,其特点是在两类或多类别的序列库中找到特征信息,在实际的生活和生产中应用十分广泛。随着数据规模的不断增加,算法的挖掘效率显得尤为重要,但是当前对比模式挖掘仍存在挖掘速度太慢的问题。为了快速挖掘满足密度约束和间隙约束的对比模式,文中提出了一种近似求解算法ADMD(Approximately Distinguishing Patterns Mining Based on Density Constraint),该算法在模式的挖掘过程中允许存在小部分的模式丢失,从而换取挖掘速度的大幅提升。该算法采用网树的特殊结构来计算模式的支持数;采用模式拼接的方式来生成候选模式;采用预判式剪枝策略对模式进行剪枝,以避免大量冗余模式的生成。但由于在剪枝过程中可能会剪掉一部分非冗余模式,造成挖掘结果并非完备,因此该算法是一种近似求解算法。在ADMD算法的基础上,通过在剪枝策略中设定参数k的方式来得到ADMD-k算法,该算法可以通过设定k的取值来调整剪枝程度,从而在挖掘效率和准确率方面取得平衡。最后在真实的蛋白质数据集上将所提算法与其他算法从挖掘的对比模式数量和挖掘速度方面进行对比实验。实验结果表明,在k=1.5的情况下,所提算法仅用不到原来13%的时间,就可以挖掘到99%以上的模式,具有近似度高、速度快的特点。
中图分类号:
[1]AGRAWAL R,SRIKANT R.Mining sequential patterns[C]//Proceedings of 11th International Conference on Data Enginee-ring.1995:3-14.[2]EXARCHOS T P,TSIPOURAS M G,PAPALOUKAS C,et al.A two-stage methodology for sequence classification based on sequential pattern mining and optimization [J].Data & Know-ledge Engineering,2008,66(3):467-487.[3]DING B,LO D,HAN J,et al.Efficient mining of closed repetitive gapped subsequences from a sequence database[C]//IEEE International Conference on Data Engineering.Shanghai,China:IEEE Computer Society,2009:1024-1035.[4]WANG H,DING S F.Research and development of sequential pattern mining [J].Computer Science,2009,36(12):14-17.[5]MAO G J,HU D J,XIE S Y.Models and algorithms for classi- fying big data based on distributed data streams [J].Chinese Journal of Computers,2017,40(1):161-175.[6]WU Y X,SHEN C,JIANG H,et al.Strict pattern matching under non-overlapping condition [J].Science China Information Sciences,2017,60(1):012101.[7]TAN C D,MIN F,WANG M,et al.Discovering patterns with weak-wildcard gaps [J].IEEE Access,2016,4(1):4922-4932.[8]MIN F,ZHANG Z H,ZHAI W J,et al.Frequent pattern disco- very with tri-partition alphabets [J].Information Sciences.doi:10.1016/j.ins.2018.04.013.[9]GONG Y S,XU T T,DONG X J,et al.e-NSPFI:Efficient mi- ning negative sequential pattern from both frequent and infrequent positive sequential patterns [J].International Journal of Pattern Recognition and Artificial Intelligence,2017,31(2):1-20.[10]DONG X J,GONG Y S,CAO L B.F-NSP+:A fast negative sequential patterns mining method with self-adaptive data storage [J].Pattern Recognition,2018(84):13-27.[11]WU Y X,TONG Y,ZHU X Q,et al.NOSEP:Nonoverlapping sequence pattern mining with map constraints [J].IEEE Transactions on Cybernetics,2018,48(10):2809-2822.[12]WANG H F,DUAN L,ZUO J,et al.Efficient mining of distinguishing sequential patterns without a predefined gap constraint[J].Chinese Journal of Computers,2016,39(10):1979-1991.[13]YANG H,DUAN L,HU B,et al.Mining top-k distinguishing sequential patterns with gap constraint[J].Journal of Software,2015,26(11):2994-3009.[14]WANG X M,DUAN L,DONG G Z,et al.Efficient mining of density-aware distinguishing sequential patterns with gap constraints[C]//International Conference on Database Systems for Advanced Applications.2014:372-387.[15]GHOSH S,FENG M,NGUYEN H,et al.Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns [C]//AMIA Annual Symposium Proceedings American Medical Informatics Association.2014:1748-1757.[16]ZHENG Z G,WEI W,LIU C M,et al.An effective contrast sequential pattern mining approach to taxpayer behavior analysis [J].World Wide Web-Internet & Web Information Systems,2016,19(4):633-651.[17]WEI Q S,WU Y X,LIU J Y,et al.Distinguishing sequence patterns mining based on density on density and gap constraints [J].Computer Science,2018,45(4):252-256.[18]WU Y X,WANG L L,REN J D,et al.Mining sequential patterns with periodic wildcard gaps [J].Applied Intelligence,2014,41(1):99-116. |
[1] | 魏芹双,武优西,刘靖宇,朱怀忠. 基于密度约束和间隙约束的对比模式挖掘 Distinguishing Sequence Patterns Mining Based on Density and Gap Constraints 计算机科学, 2018, 45(4): 252-256. https://doi.org/10.11896/j.issn.1002-137X.2018.04.042 |
[2] | 董雷刚,刘国华,崔晓微. PPQ:一种基于区域划分的c-skyline查询算法 PPQ:Finding Combinatorial Skyline Based on Partition 计算机科学, 2018, 45(1): 267-272. https://doi.org/10.11896/j.issn.1002-137X.2018.01.047 |
[3] | 季辉,丁泽军. 双人博弈问题中的蒙特卡洛树搜索算法的改进 Improvement of Monte Carlo Tree Search Algorithm in Two-person Game Problem 计算机科学, 2018, 45(1): 140-143. https://doi.org/10.11896/j.issn.1002-137X.2018.01.023 |
|