计算机科学 ›› 2018, Vol. 45 ›› Issue (4): 252-256.doi: 10.11896/j.issn.1002-137X.2018.04.042

• 人工智能 • 上一篇    下一篇

基于密度约束和间隙约束的对比模式挖掘

魏芹双,武优西,刘靖宇,朱怀忠   

  1. 河北工业大学计算机科学与软件学院 天津300401 河北省大数据重点实验室 天津300401,河北工业大学计算机科学与软件学院 天津300401 河北省大数据重点实验室 天津300401,河北工业大学计算机科学与软件学院 天津300401 河北省大数据重点实验室 天津300401,河北工业大学计算机科学与软件学院 天津300401 河北省大数据重点实验室 天津300401
  • 出版日期:2018-04-15 发布日期:2018-05-11
  • 基金资助:
    本文受国家自然科学基金(61673159),河北省自然科学基金(F2016202145),黑龙江省自然科学基金(F2017019),河北省科技计划项目(15210325),河北省教育厅青年基金(QN2014192)资助

Distinguishing Sequence Patterns Mining Based on Density and Gap Constraints

WEI Qin-shuang, WU You-xi, LIU Jing-yu and ZHU Huai-zhong   

  • Online:2018-04-15 Published:2018-05-11

摘要: 对比模式挖掘是序列模式挖掘的一个重要分支,带有密度约束的对比模式有助于生物学家发现生物序列中的特殊因子的分布情况。为此,文中提出了MPDG (Mining distinguishing sequence Patterns based on Density and Gap constraint) 算法,该算法应用网树结构挖掘满足密度约束和间隙约束的对比模式,在仅需扫描一遍序列库的情况下,该算法可计算当前模式的所有超模式的支持度,从而提高挖掘效率。最后,在真实蛋白质数据集上进行实验,实验结果验证了MPDG算法的有效性。

关键词: 模式挖掘,对比模式,密度约束,网树

Abstract: Distinguishing patterns mining is an important branch of sequence patterns mining,and distinguishing patterns with density constraint can help biologists to find the distribution of special factors on biological sequences.This paper proposed an algorithm,named MPDG(Mining distinguishing sequence Patterns based on Density and Gap constraint),which employs Nettree data structure to mine the distinguishing patterns satisfying the density and gap constraints.The algorithm is efficient since it calculates all super-patterns’ supports of current pattern with one-way scanning the sequence database.Experimental results on real protein datasets verify the effectiveness of MPDG.

Key words: Pattern mining,Distinguishing pattern,Density constraint,Nettree

[1] AGRAWAL R,SRIKANT R.Mining sequential patterns [C]∥11th International Conference on Data Engineering.1995:3-14.
[2] ZHANG L,LUO P,TANG L,et al.Occupancy-based frequentpattern mining [J].ACM Transactions on Knowledge Discovery from Data(ACM TKDD),2015,10(2):1-33.
[3] MIN F,WU Y,WU X.The Apriori property of sequence pattern mining with wildcard gaps [J].International Journal Functional Informatics and Personalised Medicine,2010,4(1):138-143.
[4] DING B,LO D,HAN J,et al.Efficient mining of closed repetitive gapped subsequences from a sequence database[C]∥International Conference on Data Engineering.IEEE Computer Society,2009:1024-1035.
[5] FANG W W,XIE W,HUANG H B,et al.Sequential pattern mining based on privacy preserving [J].Computer Science,2016,43(12):195-199.(in Chinese) 方炜炜,谢伟,黄宏博,等.基于隐私保护的序列模式挖掘[J].计算机科学,2016,43(12):195-199.
[6] DONG G,LI J.Efficient mining of emerging patterns:Discovering trends and differences[C]∥Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,1999:43-52.
[7] GHOSH S,FENG M,NGUYEN H,et al.Risk prediction foracute hypotensive patients by using gap constrained sequential contrast patterns [C]∥AMIA Annual Symposium Proceedings American Medical Informatics Association.2014:1748-1754.
[8] JI X,JAMES B,DONG G.Mining minimal distinguishing subsequence patterns with gap constraints [J].Knowledge Information Systems,2007,11(3):259-286.
[9] WANG X,DUAN L,DONG G,et al.Efficient mining of density-aware distinguishing sequential patterns with gap constraints [C]∥19th International Conference of Database Systems for Advanced Applications.2014:372-387.
[10] YANG H,DUAN L,HU B,et al.Mining top-k distinguishing sequential patterns with gap constraint [J].Journal of Software,2015,6(11):2994-3009.(in Chinese) 杨皓,段磊,胡斌,等.带间隔约束的Top-k对序列模式挖掘[J].软件学报,2015,26(11):2994-3009.
[11] WANG H F,DUAN L,ZUO J,et al.Efficient mining of distinguishing sequential patterns without a predefined gap constraint [J].Journal of Computer,2016,39(10):1979-1991.(in Chinese) 王慧锋,段磊,左劼,等.免预设间隔约束的对比序列模式高效挖掘[J].计算机学报,2016,39(10):1979-1991.
[12] WU Y X,WU X D,JIANG H,et al.A heuristic algorithm for MPMGOOC [J].Journal of Computers,2011,4(8):1452-1462.(in Chinese) 武优西,吴信东,江贺,等.一种求解MPMGOOC问题的启发式算法[J].计算机学报,2011,34(8):1452-1462.
[13] WU Y,TANG Z,JIANG H,et al.Approximate Pattern Matching with Gap Constraints[J].Journal of Information Science,2016,42(5):639-658.
[14] WU Y,FU S,JIANG H,et al.Strict approximate pattern matching with general gaps[J].Applied Intelligence,2015,42(3):566-580.
[15] WU Y,WANG L,REN J,et al.Mining sequential patterns with periodic wildcard gaps[J].Applied Intelligence,2014,41(1):99-116.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 编辑部. 新网站开通,欢迎大家订阅![J]. 计算机科学, 2018, 1(1): 1 .
[2] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[3] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[4] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[5] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[6] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[7] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[8] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[9] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[10] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .