计算机科学 ›› 2024, Vol. 51 ›› Issue (10): 247-260.doi: 10.11896/jsjkx.230800146

• 数据库&大数据&数据科学 • 上一篇    下一篇

面向多尺度最近时间序列的全序链集挖掘算法

王少鹏1,2,3,4, 冯淳恺1   

  1. 1 内蒙古大学软件学院 呼和浩特 010021
    2 内蒙古教育部生态大数据工程研究中心 呼和浩特 010021
    3 内蒙古云计算与服务工程实验室 呼和浩特 010021
    4 内蒙古纪检监察大数据实验室 呼和浩特 010021
  • 收稿日期:2023-08-22 修回日期:2023-09-22 出版日期:2024-10-15 发布日期:2024-10-11
  • 通讯作者: 王少鹏(wangsp@imu.edu.cn)
  • 基金资助:
    国家自然科学基金(62066034,62262047);内蒙古科技计划基金(61862047);内蒙古纪检监察大数据实验室开放课题基金(IMDBD2020011)

All-chain Sets Mining Algorithm for Multi-scale Nearest Time Series

WANG Shaopeng1,2,3,4, FENG Chunkai1   

  1. 1 School of Software Engineering,Inner Mongolia University,Hohhot 010021,China
    2 Inner Mongolia Engineering Research Center of Ecological Big Data Ministry of Education,Hohhot,010021,China
    3 Inner Mongolia Engineering Laboratory for Cloud Computing and Service,Hohhot 010021,China
    4 Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory,Hohhot 010021,China
  • Received:2023-08-22 Revised:2023-09-22 Online:2024-10-15 Published:2024-10-11
  • About author:WANG Shaopeng,born in 1984,Ph.D, associate professor,master supervisor,is a member of CCF(No.26971M).His main research interests include big data mining,spatio-temporal big data processing and analyzing,convergence of AI and DB.
  • Supported by:
    National Natural Science Foundation of China(62066034,62262047),Inner Mongolia Science & Technology Plan(61862047) and Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory Open Project Fund(IMDBD2020011).

摘要: 挖掘时间序列中的全链集是一个新兴领域。据了解,当前并无多尺度最近时间序列的全链集挖掘算法存在。对多尺度最近时间序列下全序链集的挖掘问题进行研究,在现有LRSTOMP和ALLC算法的基础上提出了一种具有增量计算特性的挖掘算法MTSC(Mining Time Series All-Chain Sets over Multi-scale Nearest Time Series,MTSC)。该算法依次使用LRSTOMP与ALLC算法对第一个最近时间序列成员内容进行处理,得到该成员上的全序链集挖掘结果,同时保留该成员相关的PL和PR结构。从第二个最近时间序列成员开始,MTSC算法中的LRSTOMP过程只需要处理当前最近时间序列成员相对于前一个最近时间序列成员的新增部分,进一步结合前一个最近时间序列成员上的PL和PR,可以增量获得当前最近时间序列成员上的PL和PR结构,在此基础上使用ALLC算法得到该成员上的全序链集挖掘结果。相较于对每一个最近时间序列成员内容都使用LRSTOMP和ALLC算法处理的Naive方式,MTSC算法利用增量计算的思想,避免了对全部数据进行重复性计算,从而加快了算法的执行速度,具有更高的时间效率。基于公有数据样本Penguin和TiltABP的仿真实验验证了该算法的有效性,实验结果表明其性能与Naive算法完全一致,且对于以上数据样本,在空间开销增加1.1%~9.7%的情况下,可以实现时间效率80%~88.3%的提升。

关键词: 时间序列, 内容演化, 时间序列链, 全序链集, 增量计算

Abstract: Mining all-chain set in the time series is an emerging area.To the best of our knowledge,no method has been proposed to mining all-chain sets over multi-scale nearest time series.In this paper,the problem of mining all-chain sets over multi-scale nearest time series is focused.The mining problem of all-chain sets over multi-scale nearest time series is studied,and a mining algorithm with incremental computation characteristics is proposed on the basis of the existing LRSTOMP and ALLC algorithms,MTSC(mining time series all-chain sets over multi-scale nearest time series).The MTSC algorithm uses the LRSTOMP and ALLC algorithms sequentially to process the content of the 1st nearest time series member to obtain the mining results of all-chain sets over this member,while keeping the PL and PR structures associated with this member.Starting from the 2nd nearest time series member,the LRSTOMP process in the MTSC algorithm only needs to deal with the additions of the current nearest time series member with respect to the previous nearest time series member,and further combining the PL and PR on the pre-vious nearest time series member can incrementally obtain the structure of the PL and PR on the current nearest time series member,and based on which the ALLC algorithm is used to get the all-chain set mining result on that member.Compared to the Naive way using LRSTOMP and ALLC algorithms to process the content of each recent time series member,the MTSC algorithm avoids repetitive computation on all data by utilizing the idea of incremental computation,which improves the execution speed of the algorithm and has better time efficiency.Simulation experiments based on the common data samples Penguin and TiltABP verify the effectiveness of the proposed algorithm,and the experiment results show that the results of the MTSC algorithm are completely consistent with that of the Naive algorithm,and the MTSC algorithm can achieve 80%~ 88.3% improvement in time efficiency for the above data samples with an increase in space overhead of 1.1% ~ 9.7%.

Key words: Time series, Content evolution, Time series chain, All-chain set, Incremental calculation

中图分类号: 

  • TP311
[1]LI Z X,LIU H Y.Combining Global and Sequential Patterns for Multivariate Time Series Forecasting [J].Chinese Journal of Computers,2023,46(1):70-84.
[2]PATEL P,KEOGH E,LIN J,et al.Mining Motifs in Massive Time Series Databases[C]//International Conference on Data Mining.IEEE Computer Society,2002:370-377.
[3]MORI U,MENDIBURU A,KEOGH E,et al.Reliable EarlyClassification of Time Series Based on Discriminating the Classes over time[J].Data Mining & Knowledge Discovery,2016,31(1):1-31.
[4]DAU H A,BEGUM N,KEOGH E.Semi-Supervision Dramatically Improves Time Series Clustering under Dynamic Time Warping[C]//International Conference on Information and Know-ledge Management.Association for Computing Machinery,2016:999-1008.
[5]HAO M C,MARWAH M,JANETZKO H,et al.Visual Explo-ration of Frequent Patterns in Multivariate Time Series[J].Information Visualization,2012,11(1):71-83.
[6]SHOKOOHI-YEKTA M,CHEN Y P,CAMPANA B,et al.Dis-covery of Meaningful Rules in Time Series[C]//International Conference on Knowledge Discovery & Data Mining.Association for Computing Machinery,2015:1085-1094.
[7]ZHU Y,IMAMURA M,NIKOVSKI D,et al.Matrix ProfileVII:Time Series Chains:A New Primitive for Time Series Data Mining[C]//International Conference on Data Mining.IEEE,2017:695-704.
[8]HELDT T,OEFINGER M B,HOSHIYAMA M,et al.Circulatory Response to Passive and Active Changes in Posture[C]//Computers in Cardiology.IEEE,2003:263-266.
[9]ZHU Y,IMAMURA M,NIKOVSKI D.et al.Introducing Time Series Chains:A New Primitive for Time Series Data Mining[J].Knowledge and Information Systems,2019,60(2):1135-1161.
[10]TIAN X Z,HU A GU S Y.Research on Stock Market Prediction Method Based on Time Series Chain[J].Journal of Zhejiang University of Technology,2021,49(5):503-510,563.
[11]ZHENG X W,ZOU X N,REN X X,et al.ECG Prediction Based on Bidirectional Time Series Chain Discovery Algorithm[C]//International Conference on Crowd Science and Engineering.Association for Computing Machinery,2021:95-102.
[12]ZHANG L,PATEL N,LI X Q,et al.Joint Time Series Chain:Detecting Unusual Evolving Trend Across Time Series[C]//2022 SIAM International Conference on Data Mining.2022:208-216.
[13]IMAMURA M,NAKAMURA T,KEOGH E.MatrixPro-fileXXI:A Geometric Approach to Time Series Chains Improves Robustness[C]//The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(KDD'20).Association for Computing Machinery,2020:1114-1122.
[14]ZHU Y,ZIMMERMAN Z,SENOBARI N S,et al.Matrix Profile II:Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins[C]//IEEE International Conference on Data Mining.IEEE,2016:739-748.
[15]WANG S P,YUAN Y,LI H.Discovering All-Chain Set inStreaming Time Series[C]//The 23rd Pacific-Asia Conference on Knowledge Discovery and Data Mining.Springer Verlag,2019:306-318.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!