Computer Science ›› 2024, Vol. 51 ›› Issue (8): 106-116.doi: 10.11896/jsjkx.230500161

• Database & Big Data & Data Science • Previous Articles     Next Articles

Measurable Shapelets Extraction Based on Symbolic Rrepresentation for Time Series Classification

WANG Liqin, WAN Yuan, LUO Ying   

  1. School of Science,Wuhan University of Technology,Wuhan 430070,China
  • Received:2023-05-24 Revised:2023-10-22 Online:2024-08-15 Published:2024-08-13
  • About author:WANG Liqin,born in 1998,postgra-duate.Her main research interests include data mining,machine learning and pattern recognition.
    WAN Yuan,born in 1976,Ph.D,professor.Her main research interests include data mining,pattern recognition,manifold learning,machine learning and feature selection.
  • Supported by:
    Fundamental Research Funds for the Central Universities of Ministry of Education of China(2021III030JC).

Abstract: In the time series classification problems,shapelets extraction method based on symbol representation has good classification accuracy and efficiency,but the quality measurement of symbols,such as calculating TFIDF scores,is time-consuming and computatively heavy,leading to low classification efficiency.In addition,there are still a large number of shapelets candidates extracted,and the discriminating power needs to be improved.To solve these problems,this paper proposes a measurable shapelets extraction method based on symbolic representation,which includes three stages:time series data preprocessing,determining shapelets candidate set and learning shapelets,so that high-quality shapelets can be obtained quickly.In the data preprocessing stage,the time series is transformed into a symbolic aggregation approximation(SAX)representation to reduce the dimensions of the original time series.In the stage of determining the candidate set of shapelets,Bloom filters are used to filter repeated SAX words,and the filtered SAX words are stored in the hash table for quality measurement.Then,the similarity of SAX words is discriminated,and the final shapelets candidate set is determined based on the concepts of similarity and coverage.In the learning phase of shapelets,the logistic regression model is used to learn real shapelets for time series classification.In this paper,a large number of experiments are conducted on 32 datasets,and the experimental results show that the average classification accuracy and average classification efficiency of the proposed method rank second on 32 datasets.Compared with the existing time series classification methods based on shapelets,the proposed method can improve the classification efficiency while ensuring the accuracy,and has good interpretability.

Key words: Time series classification, Shapelet, SAX means, Bloom filters, Logistic regression

CLC Number: 

  • TP391
[1]AHMED T,SINGH D.Probability density functions based classification of MODIS NDVI time series data and monitoring of vegetation growth cycle[J].Advances in Space Research,2020,66(4):873-886.
[2]AL-HADEETH I,ABDULLA S,DIYKH M,et al.Adaptiveboost LS-SVM classification approach for time-series signal classification in epileptic seizure diagnosis applications[J].Expert Systems with Applications,2020,161:113676.
[3]YEUNG J,WEI Z,CHAN K Y,et al.Jump detection in financial time series using machine learning algorithms[J].Soft Computing,2020,24(3):1789-1801.
[4]BAGNALL A,LINES J,BOSTROMA,et al.The great time series classification bake off:a review and experimental evaluation of recent algorithmic advances[J].Data Mining and Knowledge Discovery,2017,31(3):606-660.
[5]XI X P,KEOGH E,SHELTONC,et al.Fast time series classification using numerosity reduction[C]//Proceedings of the 23rd International Conference on Machine learning.New York:ACM,2006:1033-1040.
[6]SAKOE H,CHIBA S.Dynamic programming algorithm optimization for spoken word recognition[J].IEEE Transactions on Acoustics Speech and Signal Processing,1978,26(1):43-49.
[7]YE L X,KEOGH E.Time series shapelets:A new primitive for data mining[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2009:947-956.
[8]GRABOCKA,WISTUBA M.Schmidt-Thieme.Fast classifica-tion of univariate and multivariate time series through shapelet discovery[J].Knowledge and Information Systems,2016,49(2):429-454.
[9]JI C,ZHAO C,LIUS J,et al.A fast shapelet selection algorithm for time series classification[J].Computer Networks,2019,148:231-240.
[10]ZOU X N,ZHENG X W,JI C,et al.An improved fast shapelet selection algorithm and its application to pervasive EEG[J].Personal Ubiquitous Comput,2022,26(4):941-953.
[11]GRABOCKA J,SCHILLING N,WISTUBA M,et al.Learning time-series shapelets[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2014:392-401.
[12]HOU L,KWOK J T,ZURADA J M.Efficient learning of time series shapelets[C]//Proceedings of the 13th AAAI Conference on Artificial Intelligence.Phoenix:AAAI Press,2016:1209-1215.
[13]ZHANG Z,ZHANG H,WEN Y,et al.Discriminative extraction of features from time series[J].Neurocomputing,2018,275:2317-2328.
[14]RAKTHANMANON T,KEOGH E.Fast shapelets:A scalable algorithm for discovering time series shapelets[C]//Proceedings of the 2013 SIAM International Conference on Data Mining.Philadelphia:SIAM,2013:668-676.
[15]FANG Z,WANG P,WANG W.Efficient learning interpretableshapelets for accurate time series classification[C]//2018 IEEE 34th International Conference on Data Engineering.Paris:IEEE,2018:497-508.
[16]LI G Z,BYRON C,XU J L,et al.Efficient Shapelet Discovery for Time Series Classification[J].IEEE Transactions on Know-ledge and Data Engineering,2022,34(3):1149-1163.
[17]KRUSKALW H.A nonparametric test for the several sample problem[J].Annals of Mathematical Statistics,1952,23(4):525-540.
[18]HILLS J,LINES J,BARANAUSKAS E,et al.Classification of time series by shapelet transformation[J].Data Mining and Knowledge Discovery,2014,28(4):851-881.
[19]FAWAZ H I,FORESTIER G,WEBER J,et al.Deep learning for time series classification:A review[J].Data Mining and Knowledge Discovery,2019,33(4):917-963.
[20]MUEEN A,KEOGH E,YOUNG N.Logical-shapelets:An expressive primitive for time series classification[C]//Proceedings of the 17th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.New York:ACM,2011:1154-1162.
[21]YUAN J D,WANG Z H,HAN M.et al.A logical shapelets transformation for time series classification[J].Chinese Journal of Computers,2015,38:1448-1459.
[22]LINES J,DAVIS L M,HILLS J,et al.A shapelet transform for time series classification[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Beijing:ACM,2012:289-297.
[23]ZHAO H Y,PAN Z S,TAO W.Regularized shapelet learning for scalable time series classification[J].Compute Networks,2020,173:107171.
[24]LIN J,KEOGH E,WEI L,et al.Experiencing sax:A novel symbolic representation of time series[J].Data Mining and Know-ledge Discovery,2007,15(2):107-144.
[25]SCHAFER P,LESER U.Fast and accurate time series classification with weasel[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.Singapore:ACM,2017:637-646.
[26]SENIN P,MALINCHIK S.Sax-VSM:Interpretable time series classification using sax and vector space model[C]//IEEE 13th International Conference on Data Engineering.Dallas:IEEE,2013:1175-1180.
[27]NGUYEN T L,GSPONER S,IFRIM G.Time series classification by sequence learning in all-subsequence space[C]//2017 IEEE 33rd International Conference on Data Engineering.San Diego:IEEE,2017:947-958.
[28]IFRIM G,WIUF C.Bounded coordinate-descent for biological sequence classification in high dimensional predictor space[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Diego:ACM,2011:708-716.
[29]LIANG Z Y,WANG H Z.Efficient Class-Specific shapeletslearning for interpretable time series classification[J].Information Sciences,2021,570:428-450.
[30]ZHANG H,WANG P,FANGZ,et al.ELIS++:a shapelet learning approach for accurate and efficient time series classification[J].World Wide Web,24(2):511-539.
[31]BAGNALL A,LINES J,HILLS J,et al.Time-series classification with COTE:The collective of transformation-based ensembles[J].IEEE Transactions on Knowledge and Data Enginee-ring,2015,27(2):2522-2535.
[32]FAWAZ H I,FORESTIER G,WEBERJ,et al.Deep learning for time series classification:A review[J].Data Mining and Know-ledge Discovery,2019,33(4):917-963.
[33]LARSEN R J,MARX M L.An Introduction to Mathematical Statistics and its Applications[M].London:Prentice Hall,2011.
[1] LUO Ying, WAN Yuan, WANG Liqin. Category-specific and Diverse Shapelets Extraction for Time Series Based on Adversarial Strategies [J]. Computer Science, 2024, 51(5): 35-44.
[2] QIAO Fan, WANG Peng, WANG Wei. Multivariate Time Series Classification Algorithm Based on Heterogeneous Feature Fusion [J]. Computer Science, 2024, 51(2): 36-46.
[3] LI Chen, WAN Yuan. Study on Time Series Shapelets Extraction Based on Optimization and Two-phase Filtering [J]. Computer Science, 2023, 50(2): 146-157.
[4] DUAN Mengmeng, JIN Cheng. Transformer Feature Fusion Network for Time Series Classification [J]. Computer Science, 2023, 50(12): 97-103.
[5] GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang. Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features [J]. Computer Science, 2022, 49(7): 40-49.
[6] LI Jing-tai, WANG Xiao-dan. XGBoost for Imbalanced Data Based on Cost-sensitive Activation Function [J]. Computer Science, 2022, 49(5): 135-143.
[7] WANG Dong-xia, LEI Yong-mei, ZHANG Ze-yu. Communication Efficient Asynchronous ADMM for General Form Consensus Optimization [J]. Computer Science, 2022, 49(11): 309-315.
[8] YANG Wen-bo, YUAN Ji-dong. Locally Black-box Adversarial Attack on Time Series [J]. Computer Science, 2022, 49(10): 285-290.
[9] HU Yan-mei, YANG Bo, DUO Bin. Logistic Regression with Regularization Based on Network Structure [J]. Computer Science, 2021, 48(7): 281-291.
[10] YU Meng-chi, MU Jia-peng, CAI Jian, XU Jian. Noisy Label Classification Learning Based on Relabeling Method [J]. Computer Science, 2020, 47(6): 79-84.
[11] BIAN Yu-ning, LU Li-kun, LI Ye-li, ZENG Qing-tao, SUN Yan-xiong. Implementation of Financial Venture Capital Score Card Model Based on Logistic Regression [J]. Computer Science, 2020, 47(11A): 116-118.
[12] LIU Meng-juan,ZENG Gui-chuan,YUE Wei,QIU Li-zhou,WANG Jia-chang. Review on Click-through Rate Prediction Models for Display Advertising [J]. Computer Science, 2019, 46(7): 38-49.
[13] ZHANG Shi-xiang, LI Wang-geng, LI Tong, ZHU Nan-nan. Improved CoreSets Construction Algorithm for Bayesian Logistic Regression [J]. Computer Science, 2019, 46(11A): 98-102.
[14] YAN Wen-he, LI Gui-ling. Research on Time Series Classification Based on Shapelet [J]. Computer Science, 2019, 46(1): 29-35.
[15] ZHAO Hui-yun and PAN Zhi-song. Multivariate Time Series Classification Based on Shapelets Learning [J]. Computer Science, 2018, 45(5): 180-184.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!