计算机科学 ›› 2023, Vol. 50 ›› Issue (2): 146-157.doi: 10.11896/jsjkx.211200065

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于优化和两阶段筛选的时间序列Shapelets提取研究

李晨, 万源   

  1. 武汉理工大学理学院 武汉 430070
  • 收稿日期:2021-12-06 修回日期:2022-04-22 出版日期:2023-02-15 发布日期:2023-02-22
  • 通讯作者: 万源(wanyuan@whut.edu.cn)
  • 作者简介:(lichenstatistics@whut.edu.cn)
  • 基金资助:
    中央高校基本科研业务费专项资金(2021III030JC)

Study on Time Series Shapelets Extraction Based on Optimization and Two-phase Filtering

LI Chen, WAN Yuan   

  1. School of Science,Wuhan University of Technology,Wuhan 430070,China
  • Received:2021-12-06 Revised:2022-04-22 Online:2023-02-15 Published:2023-02-22
  • Supported by:
    Fundamental Research Funds for the Central Universities of Ministry of Education of China(2021III030JC)

摘要: 与基于全局特征的时间序列分类方法相比,基于shapelets的分类方法在可解释性和分类速度方面更具优势。针对现有的优化模型学习到的shapelets判别力不足以及shapelets候选数量太多等问题,提出了基于优化和两阶段筛选的时间序列shapelets提取算法。首先对时间序列取样,结合极值点和序列趋势对取样的时间序列进行分组,根据分组结果对稀疏组Lasso正则器的每项赋予权重,并在加权稀疏组Lasso的每一组中都使用融合罚正则项来保证解的相邻位置平坦变化,将多项稀疏正则项作为正则器与局部线性判别分析相结合来构建目标函数。然后,建立一个两阶段的筛选框架来度量组的稀疏性,从而快速地找到对分类起决定性作用的关键组。最后仅使用一组关键组来提取shapelets用于时间序列的分类,缩小了shapelets的规模。在28个时间序列数据集上进行了大量实验,实验结果表明,与现有的基于shapelets的提取方法相比,所提方法不仅能显著提高分类精度,具有较高的时间效率,而且能够在一定程度上缩小shapelets的规模。

关键词: hapelets, 两阶段筛选框架, 加权稀疏组Lasso, 融合罚, 关键组

Abstract: Compared with the time series classification methods based on global features,the shapelet-based methods have more advantages in interpretability,efficiency and accuracy.In order to solve the problems of insufficient discrimination of shapelets obtained from existing sparse models and the large scale of shapelets candidates,this paper proposes a shapelets extraction method based on optimization and two-phase filtering.First,the time series are sampled,and the sampled time series are grouped by combining the extreme points and the trend,then the weight of each item in the sparse group lasso regularizer are assigned according to the grouping results.The fused penalty regularization is used in each group of the weighted sparse group lasso to ensure that the adjacent positions of the solution change smoothly.Those sparse regularization terms are combined as constraints to construct the objective function together with the local fisher discriminant analysis.Then,a two-phase filtering framework is established to measure the sparsity of groups,so as to quickly locate the key group that plays a decisive role in classification.Finally,this key group is retained to extract shapelets for time series classification,which reduces the candidates of shapelets.Extensive experiments are carried out on 28 datasets,and the experimental results show that,compared with the existing shapelets-based extraction methods,the proposed method significantly improves the classification accuracy with a good efficiency,and reduces the scale of shapelets to a certain extent.

Key words: Shapelets, Two-phase filtering framework, Weighted sparse group Lasso, Fused penalty, Key group

中图分类号: 

  • TP391
[1]AU YEUNG J F K,WEI Z K,CHAN K Y,et al.Jump Detection in Financial Time Series Using Machine Learning Algorithms[J].Soft Computing,2020,24(3):1789-1801.
[2]AL-HADEETHI H,ABDULLA S,DIYKH M,et al.Adaptive Boost LS-SVM Classification Approach for Time-Series Signal Classification in Epileptic Seizure Diagnosis Applications[J].Expert Systems with Applications,2020,161:113676.
[3]ARUL M,KAREEM A.Applications of Shapelet Transform to Time Series Classification of Earthquake,Wind and Wave Data[J].Engineering Structures,2021,228:111564.
[4]AHMED T,SINGH D.Probability Density Functions BasedClassification of MODIS NDVI Time Series Data and Monitoring of Vegetation Growth Cycle[J].Advances in Space Research,2020,66(4):873-886.
[5]LI H L,LIANG Y,WANG S C.Reviewon Dynamic Time Warping in Time Series Data Mining[J].Control and Decision,2018,33(8):1345-1353.
[6]RATANAMAHATANA C A,KEOGH E J.Three Myths about Dynamic Time Warping Data Mining[C]//Proceedings of the 2005 SIAM International Conference on Data Mining.2005:506-510.
[7]DING H,TRAJCEVSKI G,SCHEUERMANN P,et al.Querying and Mining of Time Series Data:Experimental Comparison of Representations and Distance Measures[J].Proceedings of the VLDB Endowment,2008,1(2):1542-1552.
[8]YE L,KEOGH E.Time SeriesShapelets:A New Primitive for Data Mining[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:947-956.
[9]KRUSKAL W H.A Nonparametric Test for the Several Sample Problem[J].Annals of Mathematical Statistics,1952,23(4):525-540.
[10]THEOBALD R B C M.Introduction tothe Theory of Statistics[J].Journal of the Royal Statistical Society Series A(General),1974,137(1):95-96.
[11]HILLS J,LINES J,BARANAUSKAS E,et al.Classification of Time Series by Shapelet Transformation[J].Data Mining and Knowledge Discovery,2014,28(4):851-881.
[12]MUEEN A,KEOGH E,YOUNG N.Logical-Shapelets:An Expressive Primitive for Time Series Classification[C]//Procee-dings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:1154-1162.
[13]LIN J,KEOGH E,WEI L,et al.Experiencing SAX:A Novel Symbolic Representation of Time Series[J].Data Mining and Knowledge Discovery,2007,15(2):107-144.
[14]YAHYAOUI H,AL-DAIHANI R.A Novel Trend Based SAX Reduction Technique for Time Series[J].Expert Systems with Applications,2019,130:113-123.
[15]PARK H,JUNG J Y.SAX-ARM:Deviant Event Pattern Discovery from Multivariate Time Series Using Symbolic Aggregate Approxima-tion and Association Rule Mining[J].Expert Systems with Applications,2020,141:112950.
[16]BUHLER J,TOMPA M.Finding Motifs Using Random Projections[J].Journal of Computational Biology,2002,9(2):225-242.
[17]RAKTHANMANON T,KEOGH E.Fast shapelets:A Scalable Algorithm for Discovering Time Series Shapelets[C]//Procee-dings of the 2013 SIAM International Conference on Data Mi-ning.2013:668-676.
[18]WISTUBA M,GRABOCKA J,SCHMIDT-THIEME L.Ultra-Fast Shapelets for Time Series Classifi-cation[J].arXiv:1503.05018,2015.
[19]GORDON D,HENDLER D,ROKACH L.Fast and Space-Efficient Shapelets-Based Time-Series Classification[J].Intelligent Data Analysis,2015,19(5):953-981.
[20]LI Z S,HE Z F.Time Series Shapelet Extraction Based on Principal Component Analysis[J].Computer Systems Applications,2014,23(11):145-149.
[21]GRABOCKA J,SCHILLING N,WISTUBA M,et al.Learning Time-Series Shapelets[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2014:392-401.
[22]HOU L,KWOK J,ZURADA J.Efficient Learning of Timeseries Shapelets[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016:1209-1215.
[23]KARAMPATZIAKIS N,MINEIRO P.Discriminative Features Via Generalized Eigenvectors[C]//International Conference on Machine Learning.2014:494-502.
[24]TIBSHIRANI R,SAUNDERS M,ROSSET S,et al.Sparsityand Smoothness Via the Fused Lasso[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology),2005,67(1):91-108.
[25]ZHANG Z,ZHANG H,WEN Y,et al.Discrimina-tive Extraction of Features from Time Series[J].Neurocomputing,2018,275:2317-2328.
[26]SUGIYAMA M.Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction[C]//Proceedings of the 23rd International Conference on Machine Learning.2006:905-912.
[27]WANG Y,EMONET R,FROMONT E,et al.Learning Inter-pretable Shapelets for Time Series Classification through Adversarial Regularization[J].arXiv:1906.00917,2019.
[28]WANG Z,YAN W,OATES T.Time Series Classification from Scratch with Deep Neural Networks:A Strong Baseline[C]//2017 International Joint Conference on Neural Networks.2017:1578-1585.
[29]ZHAO H Y,PAN Z S.Multivariate Time Series Classification Based on Shapelets Learning[J].Computer Science,2018,45(5):180-184,219.
[30]SIMON N,FRIEDMAN J,HASTIE T,et al.A Sparse-GroupLasso[J].Journal of Computational and Graphical Statistics,2013,22(2):231-245.
[31]TIBSHIRANI R.Regression Shrinkage and Selection Via theLasso[J].Journal of the Royal Statistical Society:Series B(Methodological),1996,58(1):267-288.
[32]YUAN M,LIN Y.Model Selection and Estimation in Regression with Grouped Variables[J].Journal of the Royal Statistical Society:Series B(Statistical Methodology),2006,68(1):49-67.
[33]JI C,ZHAO C,LIU S,et al.A Fast Shapelet Selection Algorithm for Time Series Classification[J].Computer Networks,2019,148:231-240.
[34]ZHAO C,WANG T J,LIU S J,et al.A Fast Time Series Shapelet Discovery Algorithm Combining Selective Extraction and Subclass Clustering[J].Journal of Software,2020,31(3):763-777.
[35]FRIEDMAN J H.Regularized Discriminant Analysis[J].Journal of the American Statistical Association,1989,84(405):165-175.
[1] 赵慧赟,潘志松.
基于shapelets学习的多元时间序列分类
Multivariate Time Series Classification Based on Shapelets Learning
计算机科学, 2018, 45(5): 180-184. https://doi.org/10.11896/j.issn.1002-137X.2018.05.030
[2] 丁剑,王树英.
一种使用shapelets的增量式时间序列分类
Incremental Time Series Classification Algorithm Based on Shapelets
计算机科学, 2016, 43(5): 257-260. https://doi.org/10.11896/j.issn.1002-137X.2016.05.048
[3] 张颖淳,王淑良.
基于复杂网络理论的电力基础设施网络关键组件辨识
Critical Components Identification of Power Infrastructure Systems Based on Complex Network Theory
计算机科学, 2014, 41(5): 275-279. https://doi.org/10.11896/j.issn.1002-137X.2014.05.058
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!