计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 40-49.doi: 10.11896/jsjkx.210700226
所属专题: 大数据&数据科学 虚拟专题
高振卓, 王志海, 刘海洋
GAO Zhen-zhuo, WANG Zhi-hai, LIU Hai-yang
摘要: 近年来,时间序列分类问题的研究受到了广泛关注。先进的时间序列分类方法通常建立在良好的特征表示的基础之上。Shapelet是时间序列中具备鉴别性的子序列,可有效表达时间序列的局部形状特征。然而,高昂的计算成本大大限制了基于Shapelet的时间序列分类方法的实用性。除此之外,传统的Shapelet仅能描述欧氏距离度量下子序列的形状特征,因此极易受到噪声干扰并难以挖掘子序列中蕴含的其他类型的鉴别性信息。为应对上述问题,提出了一种新的时间序列分类算法——嵌入典型时间序列特征的随机Shapelet森林。该算法基于以下3个关键策略:1)随机选取Shapelet并限制Shapelet的作用范围以提高效率;2)在Shapelet中嵌入多个典型时间序列特征以提高算法对不同分类问题的适应性,并弥补随机选取Shapelet带来的精度损失;3)在新的特征表示的基础上构建随机森林分类器以确保算法的泛化能力。112个UCR时间序列数据集上的实验结果表明,本文算法的准确性超越了基于Shapelet精确搜索和Shapelet转换技术的STC算法,以及多个其他类型的先进时间序列分类算法。此外,广泛的实验对比验证了本文算法在效率上的显著优势。
中图分类号:
[1]GRABOCKA J,SCHILLING N,WISTUBA M,et al.Learningtime-series shapelets[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2014:392-401. [2]YAN W H,LI G L.Research on time series classification based on shapelet[J].Computer Science,2019,46(1):29-35. [3]MIDDLEHURST M,LARGE J,BAGNALL A.The canonical interval forest(CIF) classifier for time series classification[J].arXiv:2008.09172,2020. [4]LEE W,PARK S,JOO W,et al.Diagnosis prediction via medical context attention networks using deep generative modeling[C]//2018 IEEE International Conference on Data Mining.IEEE,2018:1104-1109. [5]LUCAS B,SHIFAZ A,PELLETIER C,et al.Proximity forest:an effective and scalable distance-based classifier for time series[J].Data Mining and Knowledge Discovery,2019,33(3):607-635. [6]FAWAZ H I,FORESTIER G,WEBER J,et al.Deep learningfor time series classification:a review[J].Data Mining and Knowledge Discovery,2019,33(4):917-963. [7]DAU H A,KEOGH E,KAMGAR K,et al.The UCR Time Series Classification Archive[EB/OL].https://www.cs.ucr.edu/~eamonn/time_series_data_2018. [8]YE L,KEOGH E.Time series shapelets:a novel technique that allows accurate,interpretable and fast classification[J].Data Mining and Knowledge Discovery,2011,22(1):149-182. [9]RAKTHANMANON T,KEOGH E.Fast shapelets:a scalablealgorithm for discovering time series shapelets[C]//Proceedings of the 13th SIAM International Conference on Data Mining.SIAM,2013:668-676. [10]KARLSSON I,PAPAPETROU P,BOSTRÖM H.Generalizedrandom shapelet forests[J].Data Mining and Knowledge Discovery,2016,30(5):1053-1085. [11]LI G L,YAN W H,WU Z D.Discovering shapelets with key points in time series classification[J].Expert Systems with Applications,2019,132:76-86. [12]ZHANG Z G,ZHANG H W,WEN Y L,et al.Discriminative extraction of features from time series[J].Neurocomputing,2018,275:2317-2328. [13]MUEEN A,KEOGH E,YOUNG N.Logical-shapelets:an expressive primitive for time series classification[C]//Proceedings of the 17th ACM SIGKDD International Conference on Know-ledge Discovery and Data Mining.ACM,2011:1154-1162. [14]DENG H,RUNGER G,TUV E,et al.A time series forest for classification and feature extraction[J].Information Sciences,2013,239:142-153. [15]LUBBA C H,SETHI S S,KNAUTE P,et al.Catch22:canonical time-series characteristics[J].Data Mining and Knowledge Discovery,2019,33(6):1821-1852. [16]HILLS J,LINES J,BARANAUSKAS E,et al.Classification of time series by shapelet transformation[J].Data Mining and Knowledge Discovery,2014,28(4):851-881. [17]BOSTROM A,BAGNALL A.Binary shapelet transform formulticlass time series classification[J/OL].Transactions on Large-Scale Data-and Knowledge-Centered Systems XXXII.Springer,2017:24-46.https://doi.org/10.1007/978-662-55608-5_2. [18]BAGNALL A,LINES J,VICKERS W,et al.The UEA & UCR Time Series Classification Repository [EB/OL].http://www.timeseriesclassification.com. [19]WANG Z,YAN W,OATES T.Time series classification from scratch with deep neural networks:a strong baseline[C]//2017 International Joint Conference on Neural Networks.IEEE,2017:1578-1585. [20]LIN J,KEOGH E,WEI L,et al.Experiencing SAX:a novelsymbolic representation of time series[J].Data Mining and Knowledge Discovery,2007,15(2):107-144. [21]LINES J,TAYLOR S,BAGNALL A.Time series classification with HIVE-COTE:the hierarchical vote collective of transformation-based ensembles[J].ACM Transactions on Knowledge Discovery from Data,2018,12(5):1-35. [22]LIN J,KHADE R,LI Y.Rotation-invariant similarity in time series using bag-of-patterns representation[J].Journal of Intelligent Information Systems,2012,39(2):287-315. [23]SENIN P,MALINCHIK S.SAX-VSM:interpretable time series classification using SAX and vector space model[C]//2013 IEEE 13rd International Conference on Data Mining.IEEE,2013:1175-1180. [24]SCHÄFER P.The BOSS is concerned with time series classification in the presence of noise[J].Data Mining and Knowledge Discovery,2015,29(6):1505-1530. [25]SCHÄFER P,LESER U.Fast and accurate time series classification with WEASEL[C]//Proceedings of the 2017 ACM Conference on Information and Knowledge Management.ACM,2017:637-646. [26]SCHÄFER P,HÖGQVIST M.SFA:a symbolic fourier approximation and index for similarity search in high dimensional datasets[C]//Proceedings of the 15th International Conference on Extending Database Technology.Springer,2012:516-527. [27]MIDDLEHURST M,VICKERS W,BAGNALL A.Scalable dictionary classifiers for time series classification[C]//Internatio-nal Conference on Intelligent Data Engineering and Automated Learning.Springer,2019:11-19. [28]ZHANG W,WANG Z H,YUAN J D,et al.Time series discri-minative feature dictionary construction algorithm[J].Ruan Jian Xue Bao:Journal of Software,2020,31(10):3216-3237. [29]YUAN J D,LIN Q H,ZHANG W,et al.Locally slope-based dynamic time warping for time series classification[C]//Procee-dings of the 28th ACM International Conference on Information and Knowledge Management.ACM,2019:1713-1722. [30]BAGNALL A,LINES J,BOSTROM A,et al.The great time series classification bake off:A review and experimental evaluation of recent algorithmic advances[J].Data Mining and Knowledge Discovery,2017,31(3):606-660. [31]LINES J,BAGNALL A.Time series classification with ensembles of elastic distance measures[J].Data Mining and Know-ledge Discovery,2015,29(3):565-592. [32]BAGNALL A,LINES J,HILLS J,et al.Time-series classification with COTE:the collective of transformation-based ensembles[J].IEEE Transactions on Knowledge and Data Enginee-ring,2015,27(9):2522-2535. [33]FULCHER B D,JONES N S.Hctsa:a computational framework for automated time-series phenotyping using massive feature extraction[J].Cell Systems,2017,5(5):527-531. [34]DEMAR J.Statistical comparisons of classifiers over multiple datasets[J].Journal of Machine Learning Research,2006,7(1):1-30. |
[1] | 陈志强, 韩萌, 李慕航, 武红鑫, 张喜龙. 数据流概念漂移处理方法研究综述 Survey of Concept Drift Handling Methods in Data Streams 计算机科学, 2022, 49(9): 14-32. https://doi.org/10.11896/jsjkx.210700112 |
[2] | 周旭, 钱胜胜, 李章明, 方全, 徐常胜. 基于对偶变分多模态注意力网络的不完备社会事件分类方法 Dual Variational Multi-modal Attention Network for Incomplete Social Event Classification 计算机科学, 2022, 49(9): 132-138. https://doi.org/10.11896/jsjkx.220600022 |
[3] | 郝志荣, 陈龙, 黄嘉成. 面向文本分类的类别区分式通用对抗攻击方法 Class Discriminative Universal Adversarial Attack for Text Classification 计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077 |
[4] | 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航. 监督和半监督学习下的多标签分类综述 Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning 计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111 |
[5] | 檀莹莹, 王俊丽, 张超波. 基于图卷积神经网络的文本分类方法研究综述 Review of Text Classification Methods Based on Graph Convolutional Network 计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064 |
[6] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[7] | 杨炳新, 郭艳蓉, 郝世杰, 洪日昌. 基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用 Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition 计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070 |
[8] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[9] | 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥. 视频理解中的动作质量评估方法综述 Survey on Action Quality Assessment Methods in Video Understanding 计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028 |
[10] | 杜丽君, 唐玺璐, 周娇, 陈玉兰, 程建. 基于注意力机制和多任务学习的阿尔茨海默症分类 Alzheimer's Disease Classification Method Based on Attention Mechanism and Multi-task Learning 计算机科学, 2022, 49(6A): 60-65. https://doi.org/10.11896/jsjkx.201200072 |
[11] | 李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134 |
[12] | 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓. 一种可快速迁移的领域知识图谱构建方法 Fast and Transmissible Domain Knowledge Graph Construction Method 计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018 |
[13] | 黄少滨, 孙雪薇, 李熔盛. 基于跨句上下文信息的神经网络关系分类方法 Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network 计算机科学, 2022, 49(6A): 119-124. https://doi.org/10.11896/jsjkx.210600150 |
[14] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[15] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
|