计算机科学 ›› 2024, Vol. 51 ›› Issue (8): 106-116.doi: 10.11896/jsjkx.230500161
王礼勤, 万源, 罗颖
WANG Liqin, WAN Yuan, LUO Ying
摘要: 在时序分类问题中,基于符号表示的shapelets提取方法具有良好的分类精度和分类效率,但对符号进行质量度量的过程,如计算TFIDF分数,耗时较长且计算量大,导致分类效率较低。此外,提取的shapelets候选数量仍然较多,判别力有待提高。针对这些问题,本文提出了一种基于符号表示的可度量shapelets提取方法,该方法包含时间序列数据预处理、确定shapelets候选集和学习shapelets 3个阶段,可以快速得到高质量shapelets。在数据预处理阶段,将时间序列转化为符号聚合近似(SAX)表示以降低原始时间序列的维度。在确定shapelets候选集阶段,利用Bloom过滤器过滤重复的SAX词,并将过滤后的SAX词存储在哈希表中进行质量度量。随后,对SAX词的相似性进行判别,基于相似性和覆盖度等概念确定最终的shapelets候选集。在学习shapelets阶段,采用logistic回归模型学得真正的shapelets用于时序分类。在32个数据集上进行了大量实验,实验结果表明,所提方法的平均分类精度和平均分类效率均排名第二。与现有的基于shapelets的时序分类方法相比,该方法可以在保证精度的同时提高分类效率,并且具有良好的可解释性。
中图分类号:
[1]AHMED T,SINGH D.Probability density functions based classification of MODIS NDVI time series data and monitoring of vegetation growth cycle[J].Advances in Space Research,2020,66(4):873-886. [2]AL-HADEETH I,ABDULLA S,DIYKH M,et al.Adaptiveboost LS-SVM classification approach for time-series signal classification in epileptic seizure diagnosis applications[J].Expert Systems with Applications,2020,161:113676. [3]YEUNG J,WEI Z,CHAN K Y,et al.Jump detection in financial time series using machine learning algorithms[J].Soft Computing,2020,24(3):1789-1801. [4]BAGNALL A,LINES J,BOSTROMA,et al.The great time series classification bake off:a review and experimental evaluation of recent algorithmic advances[J].Data Mining and Knowledge Discovery,2017,31(3):606-660. [5]XI X P,KEOGH E,SHELTONC,et al.Fast time series classification using numerosity reduction[C]//Proceedings of the 23rd International Conference on Machine learning.New York:ACM,2006:1033-1040. [6]SAKOE H,CHIBA S.Dynamic programming algorithm optimization for spoken word recognition[J].IEEE Transactions on Acoustics Speech and Signal Processing,1978,26(1):43-49. [7]YE L X,KEOGH E.Time series shapelets:A new primitive for data mining[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2009:947-956. [8]GRABOCKA,WISTUBA M.Schmidt-Thieme.Fast classifica-tion of univariate and multivariate time series through shapelet discovery[J].Knowledge and Information Systems,2016,49(2):429-454. [9]JI C,ZHAO C,LIUS J,et al.A fast shapelet selection algorithm for time series classification[J].Computer Networks,2019,148:231-240. [10]ZOU X N,ZHENG X W,JI C,et al.An improved fast shapelet selection algorithm and its application to pervasive EEG[J].Personal Ubiquitous Comput,2022,26(4):941-953. [11]GRABOCKA J,SCHILLING N,WISTUBA M,et al.Learning time-series shapelets[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2014:392-401. [12]HOU L,KWOK J T,ZURADA J M.Efficient learning of time series shapelets[C]//Proceedings of the 13th AAAI Conference on Artificial Intelligence.Phoenix:AAAI Press,2016:1209-1215. [13]ZHANG Z,ZHANG H,WEN Y,et al.Discriminative extraction of features from time series[J].Neurocomputing,2018,275:2317-2328. [14]RAKTHANMANON T,KEOGH E.Fast shapelets:A scalable algorithm for discovering time series shapelets[C]//Proceedings of the 2013 SIAM International Conference on Data Mining.Philadelphia:SIAM,2013:668-676. [15]FANG Z,WANG P,WANG W.Efficient learning interpretableshapelets for accurate time series classification[C]//2018 IEEE 34th International Conference on Data Engineering.Paris:IEEE,2018:497-508. [16]LI G Z,BYRON C,XU J L,et al.Efficient Shapelet Discovery for Time Series Classification[J].IEEE Transactions on Know-ledge and Data Engineering,2022,34(3):1149-1163. [17]KRUSKALW H.A nonparametric test for the several sample problem[J].Annals of Mathematical Statistics,1952,23(4):525-540. [18]HILLS J,LINES J,BARANAUSKAS E,et al.Classification of time series by shapelet transformation[J].Data Mining and Knowledge Discovery,2014,28(4):851-881. [19]FAWAZ H I,FORESTIER G,WEBER J,et al.Deep learning for time series classification:A review[J].Data Mining and Knowledge Discovery,2019,33(4):917-963. [20]MUEEN A,KEOGH E,YOUNG N.Logical-shapelets:An expressive primitive for time series classification[C]//Proceedings of the 17th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.New York:ACM,2011:1154-1162. [21]YUAN J D,WANG Z H,HAN M.et al.A logical shapelets transformation for time series classification[J].Chinese Journal of Computers,2015,38:1448-1459. [22]LINES J,DAVIS L M,HILLS J,et al.A shapelet transform for time series classification[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Beijing:ACM,2012:289-297. [23]ZHAO H Y,PAN Z S,TAO W.Regularized shapelet learning for scalable time series classification[J].Compute Networks,2020,173:107171. [24]LIN J,KEOGH E,WEI L,et al.Experiencing sax:A novel symbolic representation of time series[J].Data Mining and Know-ledge Discovery,2007,15(2):107-144. [25]SCHAFER P,LESER U.Fast and accurate time series classification with weasel[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.Singapore:ACM,2017:637-646. [26]SENIN P,MALINCHIK S.Sax-VSM:Interpretable time series classification using sax and vector space model[C]//IEEE 13th International Conference on Data Engineering.Dallas:IEEE,2013:1175-1180. [27]NGUYEN T L,GSPONER S,IFRIM G.Time series classification by sequence learning in all-subsequence space[C]//2017 IEEE 33rd International Conference on Data Engineering.San Diego:IEEE,2017:947-958. [28]IFRIM G,WIUF C.Bounded coordinate-descent for biological sequence classification in high dimensional predictor space[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.San Diego:ACM,2011:708-716. [29]LIANG Z Y,WANG H Z.Efficient Class-Specific shapeletslearning for interpretable time series classification[J].Information Sciences,2021,570:428-450. [30]ZHANG H,WANG P,FANGZ,et al.ELIS++:a shapelet learning approach for accurate and efficient time series classification[J].World Wide Web,24(2):511-539. [31]BAGNALL A,LINES J,HILLS J,et al.Time-series classification with COTE:The collective of transformation-based ensembles[J].IEEE Transactions on Knowledge and Data Enginee-ring,2015,27(2):2522-2535. [32]FAWAZ H I,FORESTIER G,WEBERJ,et al.Deep learning for time series classification:A review[J].Data Mining and Know-ledge Discovery,2019,33(4):917-963. [33]LARSEN R J,MARX M L.An Introduction to Mathematical Statistics and its Applications[M].London:Prentice Hall,2011. |
|