%A CHAI Xin, GAO Yi-han, WU You-xi, LIU Jing-yu
%T Distinguishing Patterns Mining Based on Density Constraint
%0 Journal Article
%D 2019
%J Computer Science
%R 10.11896/jsjkx.181202289
%P 26-30
%V 46
%N 12
%U {https://www.jsjkx.com/CN/abstract/article_18769.shtml}
%8 2019-12-15
%X Sequential patterns mining is to find interest patterns from sequential data.Distinguishing patterns mining is one of the mining methods,which is characterized by finding feature information in two or more categories of sequence databases.It is widely used in real life and production.With the increasing size of data,the efficiency of algorithm mi-ning is particularly important.However,the mining speed of distinguishing patterns mining is too slow at present.In order to quickly mine the distinguishing patterns that satisfy density constraint and gap constraint,this paper proposed an approximate solution algorithm ADMD (Approximately Distinguishing Patterns Mining Based on Density Constraint).This algorithm allows a small number of patterns to be lost in the process of patterns mining in exchange for a large increase in mining speed.In this algorithm,the support of the pattern is calculated by the special structure of the Net tree,the candidate patterns are generated by patterns growth approach,and the patterns are pruned by the prejudgment pruning strategy to avoid the generation of a large number of redundant patterns.However,some non-redundant patterns may be pruned in the pruning process,resulting in incomplete mining results,so the algorithm is an approximate algorithm.Based on ADMD,the ADMD-*k* algorithm was proposed by setting the parameter *k* in the pruning strategy.The algorithm can adjust the pruning degree by setting *k*,to achieve a balance between mining efficiency and accuracy.Finally,in real protein datasets,the number of mining patterns and mining speed are compared with other algorithms.The experimental results verify that when *k* is 1.5,the proposed algorithm costs no more than 13% of the time,but can find up more than 99% of patterns.Therefore,the proposed algorithm is very effective with high approximation rate and high speed.