Computer Science ›› 2022, Vol. 49 ›› Issue (7): 40-49.doi: 10.11896/jsjkx.210700226

Special Issue: Big Data & Data Scinece

• Database & Big Data & Data Science • Previous Articles     Next Articles

Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features

GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang   

  1. School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
    Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing 100044,China
  • Received:2021-07-23 Revised:2022-02-28 Online:2022-07-15 Published:2022-07-12
  • About author:GAO Zhen-zhuo,born in 1997,master,is a member of China Computer Federation.His main research interests include data mining and machine learning.
    LIU Hai-yang,born in 1987,Ph.D,lecturer,is a member of China Computer Federation.His main research interests include data mining and so on.
  • Supported by:
    National Natural Science Foundation of China(61771058)and Beijing Natural Science Foundation of China(4214067).

Abstract: In recent years,the research on the classification of time series has attracted more and more attention.Advanced time series classification methods are usually based on great feature representations.Shapelet refers to the discriminative subsequences in time series,which can effectively express the local shape characteristics of time series.However,the high computational cost greatly limits the practicability of the Shapelet-based time series classification methods.In addition,traditional Shapelet can only describe the overall shape characteristics of the subsequence under Euclidean distance metric,so it is easy to be disturbed by noise and is difficult to mine other types of discriminative information contained in the subsequence.To deal with the aforementioned problems,a new time series classification algorithm,named random Shapelet forest embedded with canonical time series features,is proposed in this paper.The proposed algorithm is based on the following three key strategies:1)randomly select Shapelet and limit the scope of Shapelet to improve efficiency;2)embed multiple canonical time series features in Shapelet to improve the adaptability of the algorithm to different classification problems and make up for the accuracy loss caused by the random selection of Shapelet;3)build a random forest classifier based on the new feature representations to ensure the generalization ability of the algorithm.Experimental results on 112 UCR time series datasets show that the proposed algorithm is more accurate than the STC algorithm which is based on Shapelet exact search and the Shapelet transform technique,as well as many other types of state-of-the-art time series classification algorithms.Moreover,extensive experimental comparisons verify the significant advantages of the proposed algorithm in terms of efficiency.

Key words: Canonical time series features, Classification, Random forest, Shapelet, Time series

CLC Number: 

  • TP181
[1]GRABOCKA J,SCHILLING N,WISTUBA M,et al.Learningtime-series shapelets[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2014:392-401.
[2]YAN W H,LI G L.Research on time series classification based on shapelet[J].Computer Science,2019,46(1):29-35.
[3]MIDDLEHURST M,LARGE J,BAGNALL A.The canonical interval forest(CIF) classifier for time series classification[J].arXiv:2008.09172,2020.
[4]LEE W,PARK S,JOO W,et al.Diagnosis prediction via medical context attention networks using deep generative modeling[C]//2018 IEEE International Conference on Data Mining.IEEE,2018:1104-1109.
[5]LUCAS B,SHIFAZ A,PELLETIER C,et al.Proximity forest:an effective and scalable distance-based classifier for time series[J].Data Mining and Knowledge Discovery,2019,33(3):607-635.
[6]FAWAZ H I,FORESTIER G,WEBER J,et al.Deep learningfor time series classification:a review[J].Data Mining and Knowledge Discovery,2019,33(4):917-963.
[7]DAU H A,KEOGH E,KAMGAR K,et al.The UCR Time Series Classification Archive[EB/OL].https://www.cs.ucr.edu/~eamonn/time_series_data_2018.
[8]YE L,KEOGH E.Time series shapelets:a novel technique that allows accurate,interpretable and fast classification[J].Data Mining and Knowledge Discovery,2011,22(1):149-182.
[9]RAKTHANMANON T,KEOGH E.Fast shapelets:a scalablealgorithm for discovering time series shapelets[C]//Proceedings of the 13th SIAM International Conference on Data Mining.SIAM,2013:668-676.
[10]KARLSSON I,PAPAPETROU P,BOSTRÖM H.Generalizedrandom shapelet forests[J].Data Mining and Knowledge Discovery,2016,30(5):1053-1085.
[11]LI G L,YAN W H,WU Z D.Discovering shapelets with key points in time series classification[J].Expert Systems with Applications,2019,132:76-86.
[12]ZHANG Z G,ZHANG H W,WEN Y L,et al.Discriminative extraction of features from time series[J].Neurocomputing,2018,275:2317-2328.
[13]MUEEN A,KEOGH E,YOUNG N.Logical-shapelets:an expressive primitive for time series classification[C]//Proceedings of the 17th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2011:1154-1162.
[14]DENG H,RUNGER G,TUV E,et al.A time series forest for classification and feature extraction[J].Information Sciences,2013,239:142-153.
[15]LUBBA C H,SETHI S S,KNAUTE P,et al.Catch22:canonical time-series characteristics[J].Data Mining and Knowledge Discovery,2019,33(6):1821-1852.
[16]HILLS J,LINES J,BARANAUSKAS E,et al.Classification of time series by shapelet transformation[J].Data Mining and Knowledge Discovery,2014,28(4):851-881.
[17]BOSTROM A,BAGNALL A.Binary shapelet transform formulticlass time series classification[J/OL].Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXII.Springer,2017:24-46.https://doi.org/10.1007/978-662-55608-5_2.
[18]BAGNALL A,LINES J,VICKERS W,et al.The UEA & UCR Time Series Classification Repository [EB/OL].http://www.timeseriesclassification.com.
[19]WANG Z,YAN W,OATES T.Time series classification from scratch with deep neural networks:a strong baseline[C]//2017 International Joint Conference on Neural Networks.IEEE,2017:1578-1585.
[20]LIN J,KEOGH E,WEI L,et al.Experiencing SAX:a novelsymbolic representation of time series[J].Data Mining and Knowledge Discovery,2007,15(2):107-144.
[21]LINES J,TAYLOR S,BAGNALL A.Time series classification with HIVE-COTE:the hierarchical vote collective of transformation-based ensembles[J].ACM Transactions on Knowledge Discovery from Data,2018,12(5):1-35.
[22]LIN J,KHADE R,LI Y.Rotation-invariant similarity in time series using bag-of-patterns representation[J].Journal of Intelligent Information Systems,2012,39(2):287-315.
[23]SENIN P,MALINCHIK S.SAX-VSM:interpretable time series classification using SAX and vector space model[C]//2013 IEEE 13rd International Conference on Data Mining.IEEE,2013:1175-1180.
[24]SCHÄFER P.The BOSS is concerned with time series classification in the presence of noise[J].Data Mining and Knowledge Discovery,2015,29(6):1505-1530.
[25]SCHÄFER P,LESER U.Fast and accurate time series classification with WEASEL[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management.ACM,2017:637-646.
[26]SCHÄFER P,HÖGQVIST M.SFA:a symbolic fourier approximation and index for similarity search in high dimensional datasets[C]//Proceedings of the 15th International Conference on Extending Database Technology.Springer,2012:516-527.
[27]MIDDLEHURST M,VICKERS W,BAGNALL A.Scalable dictionary classifiers for time series classification[C]//Internatio-nal Conference on Intelligent Data Engineering and Automated Learning.Springer,2019:11-19.
[28]ZHANG W,WANG Z H,YUAN J D,et al.Time series discri-minative feature dictionary construction algorithm[J].Ruan Jian Xue Bao:Journal of Software,2020,31(10):3216-3237.
[29]YUAN J D,LIN Q H,ZHANG W,et al.Locally slope-based dynamic time warping for time series classification[C]//Procee-dings of the 28th ACM International Conference on Information and Knowledge Management.ACM,2019:1713-1722.
[30]BAGNALL A,LINES J,BOSTROM A,et al.The great time series classification bake off:A review and experimental evaluation of recent algorithmic advances[J].Data Mining and Knowledge Discovery,2017,31(3):606-660.
[31]LINES J,BAGNALL A.Time series classification with ensembles of elastic distance measures[J].Data Mining and Know-ledge Discovery,2015,29(3):565-592.
[32]BAGNALL A,LINES J,HILLS J,et al.Time-series classification with COTE:the collective of transformation-based ensembles[J].IEEE Transactions on Knowledge and Data Enginee-ring,2015,27(9):2522-2535.
[33]FULCHER B D,JONES N S.Hctsa:a computational framework for automated time-series phenotyping using massive feature extraction[J].Cell Systems,2017,5(5):527-531.
[34]DEMŠAR J.Statistical comparisons of classifiers over multiple datasets[J].Journal of Machine Learning Research,2006,7(1):1-30.
[1] CHEN Zhi-qiang, HAN Meng, LI Mu-hang, WU Hong-xin, ZHANG Xi-long. Survey of Concept Drift Handling Methods in Data Streams [J]. Computer Science, 2022, 49(9): 14-32.
[2] ZHOU Xu, QIAN Sheng-sheng, LI Zhang-ming, FANG Quan, XU Chang-sheng. Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification [J]. Computer Science, 2022, 49(9): 132-138.
[3] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[4] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[5] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[6] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[7] YANG Bing-xin, GUO Yan-rong, HAO Shi-jie, Hong Ri-chang. Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition [J]. Computer Science, 2022, 49(7): 57-63.
[8] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[9] ZHANG Hong-bo, DONG Li-jia, PAN Yu-biao, HSIAO Tsung-chih, ZHANG Hui-zhen, DU Ji-xiang. Survey on Action Quality Assessment Methods in Video Understanding [J]. Computer Science, 2022, 49(7): 79-88.
[10] DU Li-jun, TANG Xi-lu, ZHOU Jiao, CHEN Yu-lan, CHENG Jian. Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning [J]. Computer Science, 2022, 49(6A): 60-65.
[11] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[12] DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method [J]. Computer Science, 2022, 49(6A): 100-108.
[13] HUANG Shao-bin, SUN Xue-wei, LI Rong-sheng. Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network [J]. Computer Science, 2022, 49(6A): 119-124.
[14] LIN Xi, CHEN Zi-zhuo, WANG Zhong-qing. Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning [J]. Computer Science, 2022, 49(6A): 144-149.
[15] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!