计算机科学 ›› 2024, Vol. 51 ›› Issue (2): 63-72.doi: 10.11896/jsjkx.221200038

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于对比学习的时间序列聚类方法

杨博1,2, 罗嘉琛1,2, 宋艳涛1,2, 吴宏涛3, 彭甫镕1,2   

  1. 1 山西大学大数据科学与产业研究院 太原030006
    2 山西大学计算机与信息技术学院 太原030006
    3 山西省交通科技研发有限公司 太原030006
  • 收稿日期:2022-12-06 修回日期:2023-02-28 出版日期:2024-02-15 发布日期:2024-02-22
  • 通讯作者: 彭甫镕(pengfr@sxu.edu.cn)
  • 作者简介:(yangbo981205@gmail.com)
  • 基金资助:
    国家自然科学基金(62276162);山西省重点研发计划(202102070301019);山西省基础研究计划(201901D211170,202103021223464);南京市国际联合研发项目(202002021)

Time Series Clustering Method Based on Contrastive Learning

YANG Bo1,2, LUO Jiachen1,2, SONG Yantao1,2, WU Hongtao3, PENG Furong1,2   

  1. 1 Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China
    2 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    3 Shanxi Transportation Technology R&D Co.,Ltd,Taiyuan 030006,China
  • Received:2022-12-06 Revised:2023-02-28 Online:2024-02-15 Published:2024-02-22
  • About author:YANG Bo,born in 1998.postgraduate.His main research interest is time series analysis.PENG Furong,born in 1987,Ph.D,associate professor,master supervisor.His main research interests include data mining and recommendation systems.
  • Supported by:
    National Natural Science Foundation of China(62276162),Key R&D Program of Shanxi Province,China(202102070301019),Basic Research Program of Shanxi Province(201901D211170,202103021223464) and Nanjing International Joint R & D Project(202002021).

摘要: 现有深度聚类方法严重依赖于复杂的特征提取网络和聚类算法,难以直观地定义时间序列的相似性。使用对比学习的方法可以从正负样本数据的角度定义时间序列的区间相似性,并对特征提取和聚类进行联合优化。基于对比学习的思想,提出了一种不依赖于复杂表示网络的时间序列聚类模型。同时,为解决现有时间序列数据增强方法难以描述时间序列的变换不变性的问题,提出了一种基于时间序列形状特征的数据增强方法,在忽略数据时域特征情况下捕捉序列的相似性。模型通过设置不同的形状转换参数构造正负样本对,学习特征表示并投影到特征空间,在实例级对比和聚类级对比层面利用交叉熵损失最大化正样本对相似性,最小化负样本对相似性,实现了端到端的联合学习表示和聚类分配。在32个UCR中的数据集上进行了大量实验,结果表明该模型可以在不依赖于特定表示学习网络的情况下得到与现有方法相当或优于现有方法的聚类结果。

关键词: 时间序列聚类, 对比学习, 数据增强, 表示学习, 联合优化

Abstract: It is difficult to intuitively define the similarity between time series by deep clustering methods which rely heavily on complex feature extraction networks and clustering algorithms.Contrastive learning can define the interval similarity of time series from the perspective of positive and negative sample data and jointly optimize feature extraction and clustering.Based on the contrastive learning,this paper proposes a time series clustering model that does not rely on complex representation networks.In order to solve the problem that the existing time series data enhancement methods cannot describe the transformation invariance of time series,this paper proposes a new data enhancement method that captures the similarity of sequences while ignoring the time domain characteristics of data.The proposed clustering model constructs positive and negative sample pairs by setting diffe-rent shape transformation parameters,learns feature representation,and uses cross-entropy loss to maximize the similarity of positive sample pairs and minimize negative sample pairs at the instance-level and cluster-level comparison.The proposed model can jointly learn feature representation and cluster assignment in end-to-end fashion.Extensive experiments on 32 datasets in UCR show that the proposed model can obtain equal or better performance than existing methods without relying on a specific representation learning network.

Key words: Time series clustering, Contrastive learning, Data enhancement, Representation learning, Jointly optimization

中图分类号: 

  • TP183
[1]HIRANO S,TSUMOTO S.Cluster analysis of time-series medical data based on the trajectory representation and multiscale comparison techniques[C]//Sixth International Conference on Data Mining(ICDM’06).IEEE,2006:896-901.
[2]YIN Y,ZHAO Y H,ZHANG B,et al.Clustering of synchro-nous and asynchronous co-regulated genes in time-series microarray data[J].Journal of Computer Science,2007(8):1302-1314.
[3]AGHABOZORGI S,SHIRKHORSHIDI A S,WAH T Y.Time-series clustering-a decade review[J].Information Systems,2015,53:16-38.
[4]LI H L,ZHANG L P.A Survey of Clustering Research in Time Series Data Mining[J].Journal of University of Electronic Science and Technology of China,2022,51(3):416-424.
[5]NELSON B K.Time series analysis using autoregressive integrated moving average(ARIMA) models[J].Academic Emergency Medicine,1998,5(7):739-744.
[6]CHEN H Y,LIU C H,SUN B.A review of similarity measures for time series data mining[J].Journal of Control and Decision,2017,32(1):1-11.
[7]YE L,KEOGH E.Time series shapelets:a new primitive for data mining[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:947-956.
[8]REN S G,ZHANG J X,GU X J,et al.A Review of Research on Time Series Feature Extraction Methods[J].Journal of Chinese Computer Systems,2021,42(2):271-278.
[9]ZHANG Q,WU J,ZHANG P,et al.Salient subsequence lear-ning for time series clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2018,41(9):2193-2207.
[10]FORTUIN V,HüSER M,LOCATELLO F,et al.Som-vae:Interpretable discrete representation learning on time series[C]//International Conference on Learning Representations.2019.
[11]MA Q,ZHENG J,LI S,et al.Learning representations for time series clustering[J].Advances in Neural Information Processing Systems,2019,32:3776-3786.
[12]XU Y X,ZHAO J F,WANG Y S,et al.Time-series knowledge graph representation learning[J].Computer Science,2022,49(9):162-171.
[13]LAFABREGUE B,WEBER J,GANÇARSKI P,et al.End-to-end deep representation learning for time series clustering:a comparative study[J].Data Mining and Knowledge Discovery,2022,36(1):29-81.
[14]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Proceedings of the 31st InternationalConfe-rence on Neural Information Processing Systems.2017:6000-6010.
[15]ZERVEAS G,JAYARAMAN S,PATEL D,et al.A transfor-mer-based framework for multivariate time series representation learning[C]//Proceedings of the 27th ACM SIGKDD Confe-rence on Knowledge Discovery & Data Mining.2021:2114-2124.
[16]ZHOU H,ZHANG S,PENG J,et al.Informer:Beyond efficient transformer for long sequence time-series forecasting[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021,35(12):11106-11115.
[17]HE H,ZHANG Q,BAI S,et al.CATN:Cross Attentive Tree-aware Network for Multivariate Time Series Forecasting[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2022,36(4):4030-4038.
[18]WOO G,LIU C,SAHOO D,et al.CoST:Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting [C]//International Conference on Learning Representations.2022.
[19]LI Y,HU P,LIU Z,et al.Contrastive clustering[C]//Procee-dings of the AAAI Conference on Artificial Intelligence.2021:8547-8555.
[20]SHORTEN C,KHOSHGOFTAAR T M.A survey on image data augmentation for deep learning[J].Journal of Big Data,2019,6(1):1-48.
[21]YUAN J D,WANG Z H.A Survey of Time Series Representation and Classification Algorithms[J].Computer Science,2015,42(3):1-7.
[22]WEN Q,SUN L,YANG F,et al.Time series data augmentation for deep learning:A survey[C]//International Joint Conference on Artificial Intelligence.IJCAI,2021:4653-4660.
[23]PENG X,WANG K,ZHU Z,et al.Crafting better contrastive views for siamese representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:16031-16040.
[24]CUI Z,CHEN W,CHEN Y.Multi-scale convolutional neuralnetworks for time series classification[J].arXiv:1603.06995,2016.
[25]WANG Z,YAN W,OATES T.Time series classification from scratch with deep neural networks:A strong baseline[C]//2017 International Joint Conference on Neural Networks(IJCNN).IEEE,2017:1578-1585.
[26]ZHANG H,WANG Z,LIU D.Robust stability analysis for interval Cohen-Grossberg neural networks with unknown time-varying delays[J].IEEE Transactions on Neural Networks,2008,19(11):1942-1955.
[27]GHASEDI DIZAJI K,HERANDI A,DENG C,et al.Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization[C]//Proceedings of the IEEE International Conference on Computer Vision.2017:5736-5745.
[28]GUO X,GAO L,LIU X,et al.Improved deep embedded clustering with local structure preservation [C]//IJCAI.2017:1753-1759.
[29]ZAGORUYKO S,KOMODAKIS N.Learning to compare image patches via convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2015:4353-4361.
[30]CHEN X,HE K.Exploring simple siamese representation lear-ning[C]//Proceedings of the IEEE/CVF Conference on Compu-ter Vision and Pattern Recognition.2021:15750-15758.
[31]YOU Y,CHEN T,SUI Y,et al.Graph contrastive learning with augmentations[J].Advances in Neural Information Processing Systems,2020,33:5812-5823.
[32]HADSELL R,CHOPRA S,LECUN Y.Dimensionality redu-ction by learning an invariant mapping[C]//2006 IEEE Compu-ter Society Conference on Computer Vision and Pattern Recognition(CVPR’06).IEEE,2006:1735-1742.
[33]OORD A,LI Y,VINYALS O.Representation learning with contrastive predictive coding[J].arXiv:1807.03748,2018.
[34]CHENG P,HAO W,DAI S,et al.Club:A contrastive log-ratio upper bound of mutual information[C]//International Confe-rence on Machine Learning.PMLR,2020:1779-1788.
[35]ELDELE E,RAGAB M,CHEN Z,et al.Time-series representation learning via temporal and contextual contrasting[J].International Joint Conference on Artificial Intelligence.IJCAI,2021,2352-2359.
[36]CHEN T,KORNBLITH S,NOROUZI M,et al.A simpleframework for contrastive learning of visual representations[C]//International Conference on Machine Learning.PMLR,2020:1597-1607.
[37]PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python[J].The Journal of Machine Learning Research,2011,12:2825-2830.
[38]XIE J,GIRSHICK R,FARHADI A.Unsupervised deep embedding for clustering analysis[C]//International Conference on Machine Learning.PMLR,2016:478-487.
[39]BO D,WANG X,SHI C,et al.Structural deep clustering net-work[C]//Proceedings of the Web Conference 2020.2020:1400-1410.
[40]VAN DER MAATEN L,HINTON G.Visualizing data using t-SNE[J].Journal of Machine Learning Research,2008,9(86):2579-2605.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!