计算机科学 ›› 2025, Vol. 52 ›› Issue (6): 96-105.doi: 10.11896/jsjkx.240500043

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于Transformer的时间序列预测方法综述

陈嘉俊1, 刘波1,3, 林伟伟2, 郑剑文3, 谢家晨3   

  1. 1 华南师范大学人工智能学院 广州 510631
    2 华南理工大学计算机科学与工程学院 广州 510640
    3 华南师范大学计算机学院 广州 510631
  • 收稿日期:2024-05-10 修回日期:2024-10-13 出版日期:2025-06-15 发布日期:2025-06-11
  • 通讯作者: 林伟伟(linww@scut.edu.cn)
  • 作者简介:(1046696528@qq.com)
  • 基金资助:
    国家自然科学基金面上项目(62072187);广州市开发区国际合作项目(2023GH02)

Survey of Transformer-based Time Series Forecasting Methods

CHEN Jiajun1, LIU Bo1,3, LIN Weiwei2, ZHENG Jianwen3, XIE Jiachen3   

  1. 1 School of Artificial Intelligence,South China Normal University,Guangzhou 510631,China
    2 School of Computer Science and Engineering,South China University of Technology,Guangzhou 510640,China
    3 School of Computer Science,South China Normal University,Guangzhou 510631,China
  • Received:2024-05-10 Revised:2024-10-13 Online:2025-06-15 Published:2025-06-11
  • About author:CHEN Jiajun,born in 2001,postgra-duate.His main research interests include time series forecasting and so on.
    LIN Weiwei,born in 1980,Ph.D,professor,is a distinguished member of CCF(No.37313D).His main research in-terestes include cloud computing,big data technology and AI application technology.
  • Supported by:
    National Natural Science Foundation of China(62072187) and Guangzhou Development Zone Science and Technology Project(2023GH02).

摘要: 时间序列预测作为分析历史数据以预测未来趋势的关键技术,已广泛应用于金融、气象等领域。然而,传统方法如自回归移动平均模型和指数平滑法等在处理非线性模式、捕捉长期依赖性时存在局限。最近,基于Transformer的方法因其自注意力机制,在自然语言处理与计算机视觉领域取得突破,也开始拓展至时间序列预测领域并取得显著成果。因此,探究如何将Transformer高效运用于时间序列预测,成为推动该领域发展的关键。首先,介绍了时间序列的特性,阐述了时间序列预测的常见任务类别及评估指标。接着,深入解析Transformer的基本架构,并挑选了近年来在时间序列预测中广受关注的Transfo-rmer衍生模型,从模块及架构层面进行分类,并分别从问题解决、创新点及局限性3个维度进行比较和分析。最后,进一步探讨了时间序列预测Transformer在未来可能的研究方向。

关键词: 时间序列, Transformer模型, 深度学习, 注意力机制, 预测

Abstract: Time series forecasting,a critical technique for analyzing historical data to predict future trends,has been widely applied in fields such as finance and meteorology.However,traditional methods like the autoregressive moving average model and exponential smoothing face limitations when dealing with nonlinear patterns and capturing long-term dependencies.Recently,Transformer-based approaches,due to their self-attention mechanism,have achieved breakthroughs in natural language processing and computer vision,and have also shown significant promise in time series forecasting.Therefore,exploring how to efficiently apply Transformers to time series prediction has become crucial for advancing this field.This paper first introduces the characte-ristics of time series data and explains the common task categories and evaluation metrics for time series forecasting.It then delves into the basic architecture of the Transformer model and selects Transformer-derived models that have garnered widespread attention in recent years for time series forecasting.These models are categorized based on their modules and architectures,and are compared and analyzed from three perspectives:problem-solving capabilities,innovations,and limitations.Finally,this paper discusses potential future research directions for the application of Transformers in time series forecasting.

Key words: Time series, Transformer model, Deep learning, Attention mechanism, Prediction

中图分类号: 

  • TP391
[1]LI Z X,LIU H Y.A multivariate time series forecasting method incorporatingglobaland serial features[J].Journal of Computing,2023,46(1):70-84.
[2]BARRA S,CARTA S M,CORRIGA A,et al.Deep learning and time series-to-image encoding for financial forecasting[J].IEEE/CAA Journal ofAutomatica Sinica,2020,7(3):683-692.
[3]MA C,DAI G,ZHOU J.Short-term traffic flow prediction for urban road sections based on time series analysis and LSTM_BILSTM method[J].IEEE Transactions on Intelligent Transportation Systems,2021,23(6):5615-5624.
[4]SHARMA R R,KUMAR M,MAHESHWARI S,et al.EVDHM-ARIMA-based time series forecasting model and its application for COVID-19 cases[J].IEEE Transactions on Instrumentation and Measurement,2020,70:1-10.
[5]AMJADY N.Short-term hourly load forecasting using time-series modeling with peak load estimation capability[J].IEEE Transactions on Power Systems,2001,16(3):498-505.
[6]BOX G E P,JENKINS G M,REINSEL G C,et al.Time series analysis:forecasting and control[M].John Wiley & Sons,2015.
[7]BOX G E P,PIERCE D A.Distribution of residual autocorrelations in autoregressive-integrated moving average time series models[J].Journal of the American statistical Association,1970,65(332):1509-1526.
[8]XIE Y,JIN M,ZOU Z,et al.Real-time prediction of docker container resource load based on a hybrid model of ARIMA and triple exponential smoothing[J].IEEE Transactions on Cloud Computing,2020,10(2):1386-1401.
[9]LAI G,CHANG W C,YANG Y,et al.Modeling long-and short-term temporal patterns with deep neural networks[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:95-104.
[10]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[11]GALASSI A,LIPPI M,TORRONI P.Attention in natural language processing[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(10):4291-4308.
[12]ZHANG T,GONG X,CHEN C L P.BMT-Net:Broad multitask transformer network for sentiment analysis[J].IEEE Transactions on Cybernetics,2021,52(7):6232-6243.
[13]HU Y,JIN X,ZHANG Y,et al.Rams-trans:Recurrent attention multi-scale transformer for fine-grained image recognition[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:4239-4248.
[14]CHEN H,WANG Y,GUO T,et al.Pre-trained image processing transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:12299-12310.
[15]MILLER J A,ALDOSARI M,SAEED F,et al.A survey of deep learning and foundation models for time series forecasting[J].arXiv:2401.13912,2024.
[16]LIM B,ZOHREN S.Time-series forecasting with deep learning:a survey[J].Philosophical Transactions of the Royal Society A,2021,379(2194):20200209.
[17]HAN Z,ZHAO J,LEUNG H,et al.A review of deep learning models for time series prediction[J].IEEE Sensors Journal,2019,21(6):7833-7848.
[18]YIN J,RAO W,YUANM,et al.Experimental study of multivariate time series forecasting models[C]//Proceedings of the 28th ACM International Conference on Information and Know-ledge Management.2019:2833-2839.
[19]AK R,FINK O,ZIO E.Two machine learning approaches forshort-term wind speed time-series prediction[J].IEEE Transactions on Neural Networks and Learning Systems,2015,27(8):1734-1747.
[20]LINDEMANN B,MÜLLER T,VIETZ H,et al.A survey on long short-term memory networks for time series prediction[J].Procedia Cirp,2021,99:650-655.
[21]CHENG C,SA-NGASOONGSONG A,BEYCA O,et al.Timeseries forecasting for nonlinear and non-stationary processes:a review and comparative study[J].Iie Transactions,2015,47(10):1053-1071.
[22]ARIK S O,YODER N C,PFISTER T.Self-adaptive forecasting for improved deep learning on non-stationary time-series[J].arXiv:2202.02403,2022.
[23]WANG Z,BOVIK A C.Mean squared error:Love it or leave it? A new look at signal fidelity measures[J].IEEE Signal Proces-sing Magazine,2009,26(1):98-117.
[24]WILLMOTT C J,MATSUURA K.Advantages of the mean absolute error(MAE) over the root mean square error(RMSE) in assessing average model performance[J].Climate Research,2005,30(1):79-82.
[25]CHAI T,DRAXLER R R.Root mean square error(RMSE) ormean absolute error(MAE)?-Arguments against avoiding RMSE in the literature[J].Geoscientific Model Development,2014,7(3):1247-1250.
[26]CORTES C,VAPNIK V.Support-vector networks[J].Machine learning,1995,20:273-297.
[27]BREIMAN L.Random forests[J].Machine Learning,2001,45:5-32.
[28]LIANG H T,LIU S,DU J W,et al.A review of deep learning applications in time series forecasting[J].Journal of Frontiers of Computer Science & Technology,2023,17(6):1285.
[29]MEDSKER L R,JAIN L C.Recurrent neural networks:design and applications[M].CRC press,1999.
[30]LIN S,LIN W,WU W,et al.Segrnn:Segment recurrent neural network for long-term time series forecasting[J].arXiv:2308.11200,2023.
[31]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[32]LIVIERIS I E,PINTELAS E,PINTELASP.A CNN-LSTMmodel for gold price time-series forecasting[J].Neural Computing and Applications,2020,32:17351-17360.
[33]GRAVES A.Long Short-term Memory[J].Supervised Sequence Labelling with Recurrent Neural Networks,2012,385:37-45.
[34]WEN Q,ZHOU T,ZHANG C,et al.Transformers in time series:A survey[J].arXiv:2202.07125,2022.
[35]TAO C,GAO S,SHANG M,et al.Get The Point of My Utte-rance! Learning Towards Effective Responses with Multi-Head Attention Mechanism[C]//IJCAI.2018:4418-4424.
[36]BENIDIS K,RANGAPURAM SS,FLUNKERT V,et al.Deep learning for time series forecasting:Tutorial and literature survey[J].ACM Computing Surveys,2022,55(6):1-36.
[37]WU S,XIAO X,DING Q,et al.Adversarial sparse transformer for time series forecasting[J].Advances in Neural Information Processing Systems,2020,33:17105-17115.
[38]LIM B,ARIK S Ö,LOEFF N,et al.Temporal fusion transformers for interpretable multi-horizon time series forecasting[J].International Journal of Forecasting,2021,37(4):1748-1764.
[39]ZHOU H,ZHANG S,PENG J,et al.Informer:Beyond efficient transformer for long sequence time-series forecasting[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:11106-11115.
[40]WU H,XU J,WANG J,et al.Autoformer:Decomposition transformers with auto-correlation for long-term series forecasting[J].Advances in Neural Information Processing Systems,2021,34:22419-22430.
[41]ZENG A,CHEN M,ZHANG L,et al.Are transformers effective for time series forecasting?[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:11121-11128.
[42]NIE Y,NGUYEN N H,SINTHONG P,et al.A time series is worth 64 words:Long-term forecasting with transformers[J].arXiv:2211.14730,2022.
[43]QI X,HOU K,LIU T,et al.From known to unknown:Know-ledge-guided transformer for time-series sales forecasting in ali-baba[J].arXiv:2109.08381,2021.
[44]ZHOU T,MA Z,WEN Q,et al.Fedformer:Frequency enhanced decomposed transformer for long-term series forecasting[C]//International Conference on Machine Learning.PMLR,2022:27268-27286.
[45]ZHANG Y,YAN J.Crossformer:Transformer utilizing cross-dimension dependency for multivariate time series forecasting[C]//The Eleventh International Conference on Learning Representations.2022.
[46]LIU S,YU H,LIAO C,et al.Pyraformer:Low-complexity pyramidal attention for long-range time series modeling and forecasting[C]//International Conference on Learning Representations.2021.
[47]YEH C,CHEN Y,WU A,et al.Attentionviz:A global view of transformer attention[J].IEEE Transactions on Visualization and Computer Graphics,2024,30(1):262-272.
[48]BRAŞOVEANU A M P,ANDONIE R.Visualizing transformers fornlp:a brief survey[C]//2020 24th International Conference Information Visualisation(IV).IEEE,2020:270-279.
[49]CORNIA M,BARALDI L,CUCCHIARA R.Explaining transformer-based image captioning models:An empirical analysis[J].AI Communications,2022,35(2):111-129.
[50]CHEN H,JIANG D,SAHLI H.Transformer encoder withmulti-modal multi-head attention for continuous affect recognition[J].IEEE Transactions on Multimedia,2020,23:4171-4183.
[51]LIU Z,RODRIGUEZ-OPAZO C,TENEY D,et al.Image re-trieval on real-life images with pre-trained vision-and-language models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:2125-2134.
[52]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J/OL].https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[53]JIN M,WANG S,MA L,et al.Time-llm:Time series forecasting by reprogramming large language models[J].arXiv:2310.01728,2023.
[54]ZHOU T,NIU P,SUN L,et al.One fits all:Power general time series analysis by pretrained lm[J].arXiv:2302.11939,2023.
[55]HE Y,DONG X,KANG G,et al.Asymptotic soft filter pruning for deep convolutional neural networks[J].IEEE Transactions on Cybernetics,2019,50(8):3594-3604.
[56]HE H,CAI J,LIU J,et al.Pruning self-attentions into convolutional layers in single path[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(5):13.
[57]DING Z,FU Y.Dual low-rank decompositions for robust cross-view learning[J].IEEE Transactions on Image Processing,2018,28(1):194-204.
[58]LIN S,LIN W,WUW,et al.SparseTSF:Modeling Long-term Time Series Forecasting with 1k Parameters[J].arXiv:2405.00946,2024.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!