计算机科学 ›› 2019, Vol. 46 ›› Issue (11): 251-259.doi: 10.11896/jsjkx.191100505C

• 图形图像与模式识别 • 上一篇    下一篇

基于领域偏好的可变时间窗口时序数据主题模式识别算法

王一博1,2, 彭广举1,2, 何远舵1,2, 王亚沙1,3, 赵俊峰1,2, 王江涛1,2   

  1. (高可信软件技术教育部重点实验室(北京大学) 北京100871)1
    (北京大学信息科学技术学院 北京100871)2
    (北京大学软件工程国家工程研究中心 北京100871)3
  • 收稿日期:2018-10-03 出版日期:2019-11-15 发布日期:2019-11-14
  • 通讯作者: 王亚沙(1975-),男,博士,教授,CCF高级会员,主要研究领域为城市计算、数据分析、软件工程,E-mail:wangyasha@pku.edu.cn
  • 作者简介:王一博(1993-),男,硕士生,主要研究领域为普适计算、机器学习、数据挖掘;彭广举(1995-),男,硕士生,主要研究领域为普适计算、机器学习、数据挖掘;何远舵(1992-),男,博士生,主要研究领域为普适计算,数据挖掘;赵俊峰(1974-),女,博士,副教授,主要研究领域为软件工程、知识工程、大数据分析等;王江涛(1987-),男,博士,助理研究员,CCF会员,主要研究领域为移动计算、群智感知、社会计算。
  • 基金资助:
    本文受国家自然科学基金重点支持项目(91546203),国家电网公司总部科技项目(JS71-16-005)资助。

Time Series Motif Discovery Algorithm of Variable Length Based on Domain Preference

WANG Yi-bo1,2, PENG Guang-ju1,2, HE Yuan-duo1,2, WANG Ya-sha1,3, ZHAO Jun-feng1,2, WANG Jiang-tao1,2   

  1. (Key Lab of High Confidence Software Technologies(Peking University),Ministry of Education,Beijing 100871,China)1
    (School of Electronics Engineering and Computer Science,Peking University,Beijing 100871,China)2
    (National Engineering Research Center for Software Engineering,Peking University,Beijing 100871,China)3
  • Received:2018-10-03 Online:2019-11-15 Published:2019-11-14

摘要: 随着传感器的普及,智慧城市、普适计算等领域应用不断涌现,对时序数据处理的需求也在不断增长。时序数据中反复出现的高度相似的模式被称为主题模式。时序数据的主题模式蕴含有了大量的信息,对主题模式的识别是时序数据处理的重要分支领域。现有主题模式识别算法无法根据特定应用或领域的知识来指定主题模式识别的偏好,从而难以发现对分析领域问题最具价值的模式。针对这一问题,文中给出了一种可以根据领域偏好定义子序列相似性的机制,并设计了一种针对上述相似性度量机制的可变时间窗口主题模式识别加速剪枝算法。实验证明,所提方法在多个公开数据集上,能高效且准确地发现具有领域偏好的主题模式。

关键词: 时序数据, 主题模式, 领域偏好, 可变时间窗口, 主题模式实例

Abstract: With the development of ubiquitous computing,more and more sensors are installed in our daily applications.As a result,the demand for time series data processing is very high.The similar pattern which appears in time series data several times are called time series motif.Motif contains huge amounts of information in time series data.Motif discovery is one of the most important work in motif analysis.State-of-art motif discovery algorithm cannot find proper motif based on domain knowledge.As a result,such algorithm cannot find most valuable motif.Aiming at this problem,this paper used domain distance to evaluate the similarities of subsequences based on domain knowledge.By using the new distance,this paper developed a branching method to discovery motif with variable length.Several data from real life are used to test the performance of the algorithm.The results show that the proposed algorithm can find motif with domain knowledge accurately.

Key words: Time series data, Motif, Domain knowledge, Variable time window, Motif example

中图分类号: 

  • TP274
[1] BOX G E P,JENKINS G M,REINSEL G C,et al.Time series analysis:forecasting and control[M].New Jersey:John Wiley & Sons,2015.
[2] PATEL P,KEOGH E,LIN J,et al.Mining motifs in massive time series databases[C]∥Proceedings of 2002 IEEE International Conference on Data Mining.IEEE,2002:370-377.
[3] LONARDI J,PATEL P.Finding motifs in time series[C]∥Proceedings of the 2nd Workshop on Temporal Data Mining.2002:53-68.
[4] WANG H,ZHANG D,WANG Y,et al.RT-Fall:A real-time and contactless fall detection system with commodity WiFi devices[J].IEEE Transactions on Mobile Computing,2017,16(2):511-526.
[5] BROWN A E X,YEMINI E I,GRUNDY L J,et al.A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion[J].Proceedings of the National Academy of Sciences,2013,110(2):791-796.
[6] LIN J,KEOGH E,FU A,et al.Approximations to magic:Finding unusual medical time series[C]∥18th IEEE Symposium on Computer-Based Medical Systems (CBMS’05).IEEE,2005:329-334.
[7] BARRENETXEA G,INGELREST F,SCHAEFER G,et al.Sensorscope:Out-of-the-box environmental monitoring[C]∥Proceedings of the 7th International Conference on Information Processing in Sensor Networks.IEEE Computer Society,2008:332-343.
[8] MCGOVERN A,ROSENDAHL D H,BROWN R A,et al.Identifying predictive multi-dimensional time series motifs:an application to severe weather prediction[J].Data Mining and Know-ledge Discovery,2011,22(1-2):232-258.
[9] SHOKOOHI-YEKTA M,CHEN Y,CAMPANA B,et al.Discovery of meaningful rules in time series[C]∥Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2015:1085-1094.
[10] KEOGH E,KASETTY S.On the need for time series datamining benchmarks:a survey and empirical demonstration[J].Data Mining and Knowledge Discovery,2003,7(4):349-371.
[11] MUEEN A,KEOGH E,ZHU Q,et al.Exact discovery of time series motifs[C]∥Proceedings of the 2009 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics,2009:473-484.
[12] YEH C C M,ZHU Y,ULANOVA L,et al.Matrix profile I:all pairs similarity joins for time series:a unifying view that includes motifs,discords and shapelets[C]∥2016 IEEE 16th international conference on data mining (ICDM).IEEE,2016:1317-1322.
[13] ZHU Y,ZIMMERMAN Z,SENOBARI N S,et al.Matrix profile ii:Exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins[C]∥2016 IEEE 16th International Conference on Data Mining (ICDM).IEEE,2016:739-748.
[14] MUEEN A,CHAVOSHI N.Enumeration of time series motifs of all lengths[J].Knowledge and Information Systems,2015,45(1):105-132.
[15] DAU H A,KEOGH E.Matrix Profile V:A Generic Technique to Incorporate Domain Knowledge into Motif Discovery[C]∥Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2017:125-134.
[16] MAKONIN S,ELLERT B,BAJIC' I V,et al.Electricity,water,and natural gas consumption of a residential house in Canada from 2012 to 2014[J].Scientific Ata,2016,3:160037-160037.
[17] KUBÁNEK J,MILLER K J,OJEMANN J G,et al.DecodingFlexion Of Individual Fingers Using electrocorticographic signals in humans[J].Journal of Neural Engineering,2009,6(6):066001-066001.
[1] 程文聪,邹鹏,贾焰. 基于小波概要的区间差分skyline研究[J]. 计算机科学, 2010, 37(11): 160-165.
[2] 宋应湃 汪林林. 数据挖掘技术在IT基础设施监控系统中的应用[J]. 计算机科学, 2007, 34(5): 205-207.
[3] 肖晶 黄国兴 赵若韵 黄豫蕾. 时间序列的快速相似性搜索改进算法[J]. 计算机科学, 2003, 30(9): 97-99.
[4] 周丽华 王丽珍. 基于小波变换的例外挖掘[J]. 计算机科学, 2002, 29(2): 127-129.
[5] 段立娟 高文. 时序数据库中相似序列的挖掘[J]. 计算机科学, 2000, 27(5): 39-44.
[6] 王清毅 范焱. 基于时序逻辑的时序数据库中知识发现方法[J]. 计算机科学, 1999, 26(8): 68-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .