计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 36-46.doi: 10.11896/jsjkx.230100135

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于异构特征融合的多维时间序列分类算法

乔帆1, 王鹏2, 汪卫2   

  1. 1 复旦大学软件学院 上海200438
    2 复旦大学计算机科学与技术学院 上海200438
  • 收稿日期:2023-01-31 修回日期:2023-05-16 出版日期:2024-02-15 发布日期:2024-02-22
  • 通讯作者: 王鹏(pengwang5@fudan.edu.cn)
  • 作者简介:(fqiao20@fudan.edu.cn)
  • 基金资助:
    科技部重点研发计划(2020YFB1710001)

Multivariate Time Series Classification Algorithm Based on Heterogeneous Feature Fusion

QIAO Fan1, WANG Peng2, WANG Wei2   

  1. 1 School of Software,Fudan University,Shanghai 200438,China
    2 School of Computer Science,Fudan University,Shanghai 200438,China
  • Received:2023-01-31 Revised:2023-05-16 Online:2024-02-15 Published:2024-02-22
  • About author:QIAO Fan,born in 1998,postgraduate.Her main research interests include database,data mining,and information retrieval.WANG Peng,born in 1979,Ph.D,professor,is a member of CCF(No.41708M).His main research interests include database,data mining,and series data processing.
  • Supported by:
    Key Research and Development Program of Ministry of Science and Technology of China(2020YFB1710001).

摘要: 随着大数据时代的到来和传感器的发展,多维时间序列分类问题成为数据挖掘领域的重要问题。多维时间序列存在维度高、维度间关系复杂、数据形态多变的特点,从而生成巨大的特征空间。现有方法难以选取有区分力的特征,导致方法的准确度普遍较低。另一方面,现有方法的分类结果的可解释性较差。针对上述问题,提出了一种基于异构特征融合的多维时间序列分类算法。该算法融合了时域、频域和区间统计值这3种特征并对特征进行聚类,从而找到最有代表性的特征。首先为每个维度提取不同类型的代表性特征,再通过多维度特征转换的方法融合所有维度的不同类型的特征,形成特征向量,并基于此训练分类模型。为了提高分类结果的可解释性,算法基于树结构生成不同类型的候选特征集合,然后通过聚合消除冗余和相似的特征,最终获得少量代表性特征。为了验证所提算法的有效性,在公开的UEA数据集上进行了大量实验。实验结果显示,所提算法的准确性、特征融合的合理性,以及分类结果的可解释性均优于现有方法。

关键词: 多维度时间序列, 时间序列分类, 特征融合, 可解释性, 特征聚类

Abstract: With the advance of big data and sensors,multivariable time series classification has been an important problem in data mining.Multivariate time series are characterized by high dimensionality,complex inter-dimensional relations,and variable data forms,which makes the classification methods generate huge feature spaces,and it is difficult to select discriminative features,resulting in low accuracy and hindering the interpretability.Therefore,a multivariate time series classification algorithm based on heterogeneous feature fusion is proposed in this paper.The proposed algorithm integrates time-domain,frequency-domain,and interval-based features.Firstly,a small number of representative features of different types are extracted for each dimension.Then,features of all dimensions are fused by multivariable feature transformation to learn the classifier.For univariate feature extraction,the algorithm generates different types of feature candidates based on tree structure,and then a clustering algorithm is designed to aggregate redundant and similar features to obtain a small number of representative features,which effectively reduces the number of features and enhances the interpretation of the method.In order to verify the effectiveness of the algorithm,expensive experiments are conducted on the public UEA dataset,and the proposed algorithm is compared with the existing multivariate time series classification methods.The results prove that the proposed algorithm is more accurate than the comparison methods,and the feature fusion is reasonable.What’s more,the interpretability of classification results is showed by case study.

Key words: Multivariate time series, Time series classification, Feature fusion, Interpretability, Feature clustering

中图分类号: 

  • TP391
[1]SEZER O B,GUDELEK M U,OZBAYOGLU A M.Financial time series forecasting with deep learning:A systematic literature review:2005-2019[J].Applied Soft Computing,2020,90:106181.
[2]QI H,XIAO S,SHI R,et al.COVID-19 transmission in Main-land China is associated with temperature and humidity:A time-series analysis[J].Science of the Total Environment,2020,728:138778.
[3]RUßWURM M,KÖRNER M.Self-attention for raw optical sa-tellite time series classification[J].ISPRS Journal Of Photogrammetry and Remote Sensing,2020,169:421-435.
[4]SILVA D F,GIUSTI R,KEOGH E,et al.Speeding up similarity search under dynamic time warping by pruning unpromisingalignments[J].Data Mining and Knowledge Discovery,2018,32(4):988-1016.
[5]YE L,KEOGH E.Time series shapelets:a new primitive for data mining[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:947-956.
[6]LE NGUYEN T,GSPONER S,IFRIM G.Time series classification by sequence learning in all-subsequence space[C]//2017 IEEE 33rd International Conference on Data Engineering(ICDE).IEEE,2017:947-958.
[7]LI G,CHOI B,XU J,et al.Shapenet:A shapelet-neural network approach for multivariate time series classification[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021,35(9):8375-8383.
[8]LIN J,KEOGH E,WEI L,et al.Experiencing SAX:a novelsymbolic representation of time series[J].Data Mining and Knowledge Discovery,2007,15(2):107-144.
[9]SCHÄFER P,HÖGQVIST M.SFA:a symbolic fourier approximation and index for similarity search in high dimensional datasets[C]//Proceedings of the 15th International Conference on Extending Database Technology.2012:516-527.
[10]DENG H,RUNGER G,TUV E,et al.A time series forest for classification and feature extraction[J].Information Sciences,2013,239:142-153.
[11]LINES J,TAYLOR S,BAGNALL A.Hive-cote:The hierarchical vote collective of transformation-based ensembles for time series classification[C]//2016 IEEE 16th International Confe-rence on Data MiningICDM).IEEE,2016:1041-1046.
[12]MIDDLEHURST M,LARGE J,FLYNN M,et al.HIVE-COTE 2.0:a new meta ensemble for time series classification[J].Machine Learning,2021,110(11):3211-3243.
[13]SHIFAZ A,PELLETIER C,PETITJEAN F,et al.TS-CHIEF:a scalable and accurate forest algorithm for time series classification[J].Data Mining and Knowledge Discovery,2020,34(3):742-775.
[14]ISMAIL FAWAZ H,FORESTIER G,WEBER J,et al.Deeplearning for time series classification:a review[J].Data Mining and Knowledge Discovery,2019,33(4):917-963.
[15]RUIZ A P,FLYNN M,LARGE J,et al.The great multivariate time series classification bake off:a review and experimental evaluation of recent algorithmic advances[J].Data Mining and Knowledge Discovery,2021,35(2):401-449.
[16]ZHANG X,GAO Y,LIN J,et al.Tapnet:Multivariate time series classification with attentional prototypical network[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020,34(4):6845-6852.
[17]KARLSSON I,PAPAPETROU P,BOSTRÖM H.Generalized random shapelet forests[J].Data Mining and Knowledge Discovery,2016,30(5):1053-1085.
[18]SHOKOOHI-YEKTA M,WANG J,KEOGH E.On the non-trivial generalization of dynamic time warping to the multi-dimensional case[C]//Proceedings of the 2015 SIAM Interna-tional Conference on Data Mining.Society for Industrial and Applied Mathematics.2015:289-297.
[19]WISTUBA M,GRABOCKA J,SCHMIDT-THIEME L.Ultra-fast shapelets for time series classification[J].arXiv:1503.05018,2015.
[20]BAYDOGAN M G,RUNGER G.Time series representation and similarity based on local autopatterns[J].Data Mining and Knowledge Discovery,2016,30:476-509.
[21]BAYDOGAN M G,RUNGER G.Learning a symbolic representation for multivariate time series classification[J].Data Mining and Knowledge Discovery,2015,29:400-422.
[22]SCHÄFER P,LESER U.Multivariate time series classification with WEASEL+ MUSE[J].arXiv:1711.11343,2017.
[23]MIDDLEHURST M,LARGE J,BAGNALL A.The canonicalinterval forest(CIF) classifier for time series classification[C]//2020 IEEE International Conference on Big Data.IEEE,2020:188-195.
[24]LUBBA C H,SETHI S S,KNAUTE P,et al.catch22:Canonical time-series characteristics[J].Data Mining and Knowledge Discovery,2019,33(6):1821-1852.
[25]KARIM F,MAJUMDAR S,DARABI H,et al.MultivariateLSTM-FCNs for time series classification[J].Neural Networks,2019,116:237-245.
[26]ZHENG Y,LIU Q,CHEN E,et al.Time series classificationusing multi-channels deep convolutional neural networks[C]//International conference on web-age information management.Springer International Publishing,2014:298-310.
[27]TUNCEL K S,BAYDOGAN M G.Autoregressive forests for multivariate time series modeling[J].Pattern Recognition,2018,73:202-215.
[28]FRANCESCHI J Y,DIEULEVEUT A,JAGGI M.Unsuper-vised scalable representation learning for multivariate time series[J].arXiv:1901.10738,2019.
[29]BAGNALL A,DAU H A,LINES J,et al.The UEA multiva-riate time series classification archive,2018[J].arXiv:1811.00075,2018.
[30]DEMPSTER A,PETITJEAN F,WEBB G I.ROCKET:exceptionally fast and accurate time series classification using random convolutional kernels[J].Data Mining and Knowledge Discove-ry,2020,34(5):1454-1495.
[31]DEMPSTER A,SCHMIDT D F,WEBB G I.Minirocket:A very fast(almost) deterministic transform for time series classification[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:248-257.
[32]LARGE J,KEMSLEY E K,WELLNER N,et al.Detectingforged alcohol non-invasively through vibrational spectroscopy and machine learning[C]//Pacific-Asia Conferenceon Know-ledge Discovery and Data Mining.Cham:Springer,2018:298-309.
[33]VILLAR J R,VERGARA P,MENÉNDEZ M,et al.Generalized models for the classification of abnormal movements in daily life and its applicability to epilepsy convulsion recognition[J].International Journalof Neural Systems,2016,26(6):1650037.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!