Computer Science ›› 2019, Vol. 46 ›› Issue (12): 26-30.doi: 10.11896/jsjkx.181202289

• Big Data & Data Science • Previous Articles     Next Articles

Distinguishing Patterns Mining Based on Density Constraint

CHAI Xin, GAO Yi-han, WU You-xi, LIU Jing-yu   

  1. (School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China);
    (Hebei Province Key Laboratory of Big Data Calculation,Tianjin 300401,China)
  • Received:2018-12-11 Online:2019-12-15 Published:2019-12-17

Abstract: Sequential patterns mining is to find interest patterns from sequential data.Distinguishing patterns mining is one of the mining methods,which is characterized by finding feature information in two or more categories of sequence databases.It is widely used in real life and production.With the increasing size of data,the efficiency of algorithm mi-ning is particularly important.However,the mining speed of distinguishing patterns mining is too slow at present.In order to quickly mine the distinguishing patterns that satisfy density constraint and gap constraint,this paper proposed an approximate solution algorithm ADMD (Approximately Distinguishing Patterns Mining Based on Density Constraint).This algorithm allows a small number of patterns to be lost in the process of patterns mining in exchange for a large increase in mining speed.In this algorithm,the support of the pattern is calculated by the special structure of the Net tree,the candidate patterns are generated by patterns growth approach,and the patterns are pruned by the prejudgment pruning strategy to avoid the generation of a large number of redundant patterns.However,some non-redundant patterns may be pruned in the pruning process,resulting in incomplete mining results,so the algorithm is an approximate algorithm.Based on ADMD,the ADMD-k algorithm was proposed by setting the parameter k in the pruning strategy.The algorithm can adjust the pruning degree by setting k,to achieve a balance between mining efficiency and accuracy.Finally,in real protein datasets,the number of mining patterns and mining speed are compared with other algorithms.The experimental results verify that when k is 1.5,the proposed algorithm costs no more than 13% of the time,but can find up more than 99% of patterns.Therefore,the proposed algorithm is very effective with high approximation rate and high speed.

Key words: Density constraint, Distinguishing patterns, Mining speed, Net tree, Pruning strategy

CLC Number: 

  • TP311
[1]AGRAWAL R,SRIKANT R.Mining sequential patterns[C]//Proceedings of 11th International Conference on Data Enginee-ring.1995:3-14.
[2]EXARCHOS T P,TSIPOURAS M G,PAPALOUKAS C,et al.A two-stage methodology for sequence classification based on sequential pattern mining and optimization [J].Data & Know-ledge Engineering,2008,66(3):467-487.
[3]DING B,LO D,HAN J,et al.Efficient mining of closed repetitive gapped subsequences from a sequence database[C]//IEEE International Conference on Data Engineering.Shanghai,China:IEEE Computer Society,2009:1024-1035.
[4]WANG H,DING S F.Research and development of sequential pattern mining [J].Computer Science,2009,36(12):14-17.
[5]MAO G J,HU D J,XIE S Y.Models and algorithms for classi- fying big data based on distributed data streams [J].Chinese Journal of Computers,2017,40(1):161-175.
[6]WU Y X,SHEN C,JIANG H,et al.Strict pattern matching under non-overlapping condition [J].Science China Information Sciences,2017,60(1):012101.
[7]TAN C D,MIN F,WANG M,et al.Discovering patterns with weak-wildcard gaps [J].IEEE Access,2016,4(1):4922-4932.
[8]MIN F,ZHANG Z H,ZHAI W J,et al.Frequent pattern disco- very with tri-partition alphabets [J].Information Sciences.doi:10.1016/j.ins.2018.04.013.
[9]GONG Y S,XU T T,DONG X J,et al.e-NSPFI:Efficient mi- ning negative sequential pattern from both frequent and infrequent positive sequential patterns [J].International Journal of Pattern Recognition and Artificial Intelligence,2017,31(2):1-20.
[10]DONG X J,GONG Y S,CAO L B.F-NSP+:A fast negative sequential patterns mining method with self-adaptive data storage [J].Pattern Recognition,2018(84):13-27.
[11]WU Y X,TONG Y,ZHU X Q,et al.NOSEP:Nonoverlapping sequence pattern mining with map constraints [J].IEEE Transactions on Cybernetics,2018,48(10):2809-2822.
[12]WANG H F,DUAN L,ZUO J,et al.Efficient mining of distinguishing sequential patterns without a predefined gap constraint[J].Chinese Journal of Computers,2016,39(10):1979-1991.
[13]YANG H,DUAN L,HU B,et al.Mining top-k distinguishing sequential patterns with gap constraint[J].Journal of Software,2015,26(11):2994-3009.
[14]WANG X M,DUAN L,DONG G Z,et al.Efficient mining of density-aware distinguishing sequential patterns with gap constraints[C]//International Conference on Database Systems for Advanced Applications.2014:372-387.
[15]GHOSH S,FENG M,NGUYEN H,et al.Risk prediction for acute hypotensive patients by using gap constrained sequential contrast patterns [C]//AMIA Annual Symposium Proceedings American Medical Informatics Association.2014:1748-1757.
[16]ZHENG Z G,WEI W,LIU C M,et al.An effective contrast sequential pattern mining approach to taxpayer behavior analysis [J].World Wide Web-Internet & Web Information Systems,2016,19(4):633-651.
[17]WEI Q S,WU Y X,LIU J Y,et al.Distinguishing sequence patterns mining based on density on density and gap constraints [J].Computer Science,2018,45(4):252-256.
[18]WU Y X,WANG L L,REN J D,et al.Mining sequential patterns with periodic wildcard gaps [J].Applied Intelligence,2014,41(1):99-116.
[1] WEI Qin-shuang, WU You-xi, LIU Jing-yu and ZHU Huai-zhong. Distinguishing Sequence Patterns Mining Based on Density and Gap Constraints [J]. Computer Science, 2018, 45(4): 252-256.
[2] CHEN Feng,CHAO Wen-han,ZHOU Qing and LI Zhou-jun. Convolution Tree Kernel Based Sentiment Element Recognition Approach for Chinese Microblog [J]. Computer Science, 2014, 41(12): 133-137.
[3] ZHENG Tie-Ran ZHANG Zhan HAN Ji-Qing (School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001). [J]. Computer Science, 2008, 35(1): 216-218.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!