计算机科学 ›› 2017, Vol. 44 ›› Issue (7): 175-179.doi: 10.11896/j.issn.1002-137X.2017.07.031

• 人工智能 • 上一篇    下一篇

基于位置信息的显露序列模式挖掘研究

陈湘涛,肖碧文   

  1. 湖南大学信息科学与工程学院 长沙410082,湖南大学信息科学与工程学院 长沙410082
  • 出版日期:2018-11-13 发布日期:2018-11-13
  • 基金资助:
    本文受湖南省自然科学基金项目(2015JJ2032),湖南省科技计划项目(2014WK2002)资助

Emerging Sequences Pattern Mining Based on Location Information

CHEN Xiang-tao and XIAO Bi-wen   

  • Online:2018-11-13 Published:2018-11-13

摘要: 显露序列因为具有强区分能力,常被用来构建有效的分类器。当前算法大多关注序列模式的支持度或出现次数,而忽略序列模式在序列中的出现位置,这将导致一些重要的信息丢失。为此,提出一种带有局部位置信息的显露序列模式,并给出位置显露序列模式挖掘算法。该算法基于出现次数框架,结合后缀树,省略了候选模式的生成与选择步骤,能够快速有效地挖掘出位置显露序列模式。实验结果表明,采用位置显露序列模式构建的分类器在平均分类准确度上高于传统的显露序列模式挖掘算法。

关键词: 出现次数,显露序列,子序列,位置信息

Abstract: Owing to the strong ability of distinguishing,emerging patterns have been widely used to build defective classifier.As most of the existing algorithms focus on the support or the occurrences of sequence patterns,and the location of the sequence patterns in a sequence is usually ignored,some important information may be missed.In this paper,we put forward an emerging sequence pattern with local location information,and a mining algorithm of the emerging sequence pattern with location information.Based on the framework of occurrences,combined with the suffix tree,omitting the generation and selection procedure of candidate patterns,this algorithm can quickly and efficiently mine emerging sequence patterns with the location information.The experimental results show that the classifier which is built by emerging sequence patterns with location information is better than the traditional algorithm of mining the emerging sequence patterns on the average classification accuracy.

Key words: Occurrence,Emerging sequence,Subsequence,Location information

[1] AGRAWAL R,SRIKAN R.Mining sequential patterns[C]∥Eleventh International Conference on Data Engineering.IEEE Xplore,1995:3-14.
[2] JI X N,BAILEY J,DONG G.Mining minimal distingushing sub-sequence patters with gap constraints[J].Knowledge and Information Systems,2007,1(3):259-286.
[3] ZAANE O R,YACEF K,KAY J.Finding top-n emerging sequences to contrast sequence sets.https://core.ac.uk/display/15962069.
[4] DENG K,ZAANE O R.Contrasting sequence groups by emerging sequences[C]∥International Conference on Discovery Science.DBLP,2009:377-384.
[5] KOBYL'SKI ,WALCZAK K.Jumping emerging substrings in image classification[M]∥Computer Analysis of Images and Patterns.Berlin:Springer Berlin Heidelberg,2009:732-739.
[6] SAEED K E K,LEE H G,KIM W J,et al.Using emerging subsequence in classifying protein structural class[C]∥Fuzzy Systems and Knowledge Discovery.Piscataway,NJ:IEEE Computer Society,2009:349-353.
[7] DENG K,ZAANE O R.An Occurrence based Approach toMine Emerging Sequences[C]∥ International Comference on Data Warehousing & Knowledge Discovery.DBLP,2010:275-284.
[8] KEMMAR A,LOUDNI S,LEBBAH Y,et al.A global Cons-traint for mining Sequential Patterns with GAP constraint[J].Eprint arXiv.org,2016,9255:226-243.
[9] WU Y X,WANG L L,REN J D.Mining sequential patterns with periodic wildcard gaps[J].Applied Intelligence,2014,41(1):99-116.
[10] YANG H,DUAN L,HU B,et al.Mining top-k distinguishing se-quential patterns with gap constraint[J].Journal of Software,2015,6(11):2994-3009.杨皓,段磊,胡斌.带间隔约束的Top-k对比序列模式挖掘[J].软件学报,2015,26(11):2994-3009.
[11] ZHANG J Y,YANG C H.Sequential Pattern Mining Based on Markov Chain[C]∥International Conference on Information Technology in Medicine and Education.IEEE,2015:234-238.
[12] ZHANG J Y,MIN F.Frequent Sequence Pattern Mining Algorithm Adopting Dummy Characters[J].Journal of Chengdu University(Natural Science Edition),2013,2(2):134-137.(in Chinese) 张君雁,闵帆.采用填充字符的频繁序列模式挖掘算法[J].成都大学学报(自然科学版),2013,2(2):134-137.
[13] CHEN X T,WANG J,DING P J.Mining shared emerging sequences from multiple datasets[J].Journal of Central South University(Science andTechnology),2015(11):4091-4099.(in Chinese) 陈湘涛,王晶,丁平尖.面向多数据集的共享显露序列模式挖掘[J].中南大学学报(自然科学版),2015(11):4091-4099.
[14] LIU D Y,FENG J,LI X F.A logic-based frequent sequentialpattern mining algorithm[J].Computer Science,2015,2(5):260-264.(in Chinese) 刘端阳,冯建,李晓粉.一种基于逻辑的频繁序列模式挖掘算法[J].计算机科学,2015,42(5):260-264.
[15] AN A,WAN Q,ZHAO J,et al.Diverging Patterns:Discovering Significant Frequency Change Dissimilarities in Large Databases[C]∥ACM Conference on Information and Knowledge Management.ACM,2009:1473-1476.
[16] BELAZZOUGUI D.Linear time construction of compressed text indices in compact space[C]∥arXiv.2014:148-193.
[17] BACHE K,LICHMAN M.UCI machine learning repository[EB/OL].[2016-09-08].http://archive.ics.uci.edu/ml/datasets.html.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!