计算机科学 ›› 2016, Vol. 43 ›› Issue (1): 25-29.doi: 10.11896/j.issn.1002-137X.2016.01.006

• CRSSC-CWI-CGrC2015 • 上一篇    下一篇

一种改进的PrefixSpan算法及其在Web用户行为模式挖掘中的应用

姬浩博,王俊红   

  1. 山西大学计算机与信息技术学院 太原030006,山西大学计算机与信息技术学院 太原030006;山西大学计算机智能与中文信息处理教育部重点实验室 太原030006
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61202018,7,61303008),山西省青年科技基金(2013021018-1),山西省高等学校科技创新项目(2013102)资助

Research on Improved PrefixSpan Algorithm and its Application in Web User Behavior Patterns Mining

JI Hao-bo and WANG Jun-hong   

  • Online:2018-12-01 Published:2018-12-01

摘要: 序列模式挖掘是从序列数据库中挖掘相对时间或其他模式出现频率高的模式。针对PrefixSpan算法构造投影数据库时开销巨大、扫描效率不高的问题,通过以序列扩展代替项集进行扩展、放弃挖掘序列数小于阈值min_support的投影数据库以及直接递归局部频繁项等方式进行改进,并将改进方法应用于Web用户行为模式挖掘中,对日志记录中的规律进行分析和研究。实验分析表明,相比PrefixSpan算法,该改进算法在算法效率方面有一定的提高。

关键词: 序列模式挖掘,Web日志挖掘,PrefixSpan算法

Abstract: Sequential pattern mining is mining relative time or other mode of high frequency from sequence databases.Based on the PrefixSpan algorithm,the paper proposed an improved adaptive algorithm to improve the problem of expensive construction projection database and low scanning efficiency,through the methods such as using sequence expanding instead of item expanding,abandoning the project databases that the number of sequence is less than min_support and so on.Then the new method was used for Web user behavior pattern mining to analyze and research log records law.Experimental results show that,compared to the PrefixSpan algorithm,the improved algorithm has been improved in the algorithm efficiency.

Key words: Sequence pattern mining,Web log mining,PrefixSpan algorithm

[1] Agrawal R,Srikant R.Mining sequential patterns[C]∥Procee-dings of the Eleventh International Conference on Data Engineering(ICDE’95).Washington,DC:IEEE Computer Society,1995:3-14
[2] Srikant R,Agrawal R.Mining sequential patterns:generaliza-tions and performance improvements[C]∥Proceedings of the 5th International Conference on Extending Database Technology:Advances in Database Technology(EDBT’96).Berlin:Springer-Verlag,1996:3-17
[3] Han J,Pei J,Mortazviasl B,et al.FreeSpan:Frequent pattern projected sequential pattern mining[C]∥Proceedings of the 6th ACM-SIGKDD International Conference on Knowledge Disco-very and Data Mining.New York:ACM Press,2000:355-359
[4] Lu J P,Liu Y B,Ni W W,et al.Incremental updating algorithm for mining sequential patterns based on the projection database[J].Journal of Southeast University,2007,6(3):457-462(in Chinese) 陆介平,刘月波,倪巍伟,等.基于投影数据库的序列模式挖掘增量式更新算法[J].东南大学学报,2007,6(3):457-462
[5] Zhang K,Zhu Y Y.Algorithm of sequence pattern mining without repeated database scan projection[J].Journal of Computer Research and Development,2007,4(1):126-132(in Chinese) 张坤,朱扬勇.无重复投影数据库扫描的序列模式挖掘算法[J].计算机研究与发展,2007,44(1):126-132
[6] Han G W.Research on sequential pattern algorithm for data streams based on prefix sequence tree[D].Qinhuangdao:Yanshan University,2013(in Chinese) 韩高伟.基于前缀序列树的数据流序列模式算法研究[D].秦皇岛:燕山大学,2013
[7] Saputra D,Rambli D R A,Foong O M.Sequential Pattern Mi-ning using PrefixSpan with Pseudoprojection and Separator Database[C]∥International Symposium on Infermation Technology,2008.Kuala Lumpur,Malaysia,2008:1-7
[8] Dai Y X,Lian Y F,Wang H.System security and intrusion detection[M].Beijing:Tsinghua University Press,2002(in Chinese)戴英霞,连一峰,王航.系统安全与入侵检测[M].北京:清华大学出版社,2002
[9] Guo L,Xiang X,Shi Y C.Use Web usage mining to assist online E-Learning assessment[C]∥IEEE International Conference on Advanced Learning Technologies.2004:912-913
[10] Xing G,Shen J.Efficient data mining for web navigation patterns [J].Information and Software Technology,2004,46(1):55-63
[11] Xu H Q,Wang Y C.The path Webpage pre fetching model based on analyzing user access[J].Journal of Software,2003,4(6):1142-1147(in Chinese)许欢庆,王永成.基于用户访问路径分析的网页预取模型[J].软件学报,2003,14(6):1142-1147
[12] Yang Z L,Wang Y T,Kitsuregawa M M.An Effective system for mining web log[C]∥ Proceedings of the 8th Asia-Pacific Web Conference on Frontiers of WWW Research ad Development(APWeb’06).Harbin,China,2006:40-52
[13] Mabroukeh N R,Ezeife C I.A Taxonomy of Sequential Pattern Mining Algorithms[J].Journal of ACM Computing Surveys,2010,43(1):1-41
[14] Hu Y H,Wu F,Liao Y J.An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports[J].Journal of Systems and Software,2013,6(5):1224-1238
[15] Hu Y H,Huang C K T,Kao Y H.Knowledge discovery ofweighted RFM sequential patterns from customer sequence databases[J].Journal of Systems and Software,2013,6(3):779-788
[16] Salehi M,IKamalabadi N.Hybrid recommendation approach for learning material based on sequential pattern of the accessed material and the learner’s preference tree[J].Knowledge-Based Systems,2013,8:57-69
[17] Hu Y H,Tsai C F,Tai C T,et al.A novel approach for mining cyclically repeated patterns with multiple minimum supports[J].Applied Soft Computing,2015,8:90-99
[18] Zhang B B,Lin C W,Gan W S,et al.Maintaining the discovered sequential patterns for sequence insertion in dynamic databases[J].Engineering Applications of Artificial Intelligence,2014,5(10):131-142

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .