Computer Science ›› 2025, Vol. 52 ›› Issue (6): 96-105.doi: 10.11896/jsjkx.240500043

• Database & Big Data & Data Science • Previous Articles     Next Articles

Survey of Transformer-based Time Series Forecasting Methods

CHEN Jiajun1, LIU Bo1,3, LIN Weiwei2, ZHENG Jianwen3, XIE Jiachen3   

  1. 1 School of Artificial Intelligence,South China Normal University,Guangzhou 510631,China
    2 School of Computer Science and Engineering,South China University of Technology,Guangzhou 510640,China
    3 School of Computer Science,South China Normal University,Guangzhou 510631,China
  • Received:2024-05-10 Revised:2024-10-13 Online:2025-06-15 Published:2025-06-11
  • About author:CHEN Jiajun,born in 2001,postgra-duate.His main research interests include time series forecasting and so on.
    LIN Weiwei,born in 1980,Ph.D,professor,is a distinguished member of CCF(No.37313D).His main research in-terestes include cloud computing,big data technology and AI application technology.
  • Supported by:
    National Natural Science Foundation of China(62072187) and Guangzhou Development Zone Science and Technology Project(2023GH02).

Abstract: Time series forecasting,a critical technique for analyzing historical data to predict future trends,has been widely applied in fields such as finance and meteorology.However,traditional methods like the autoregressive moving average model and exponential smoothing face limitations when dealing with nonlinear patterns and capturing long-term dependencies.Recently,Transformer-based approaches,due to their self-attention mechanism,have achieved breakthroughs in natural language processing and computer vision,and have also shown significant promise in time series forecasting.Therefore,exploring how to efficiently apply Transformers to time series prediction has become crucial for advancing this field.This paper first introduces the characte-ristics of time series data and explains the common task categories and evaluation metrics for time series forecasting.It then delves into the basic architecture of the Transformer model and selects Transformer-derived models that have garnered widespread attention in recent years for time series forecasting.These models are categorized based on their modules and architectures,and are compared and analyzed from three perspectives:problem-solving capabilities,innovations,and limitations.Finally,this paper discusses potential future research directions for the application of Transformers in time series forecasting.

Key words: Time series, Transformer model, Deep learning, Attention mechanism, Prediction

CLC Number: 

  • TP391
[1]LI Z X,LIU H Y.A multivariate time series forecasting method incorporatingglobaland serial features[J].Journal of Computing,2023,46(1):70-84.
[2]BARRA S,CARTA S M,CORRIGA A,et al.Deep learning and time series-to-image encoding for financial forecasting[J].IEEE/CAA Journal ofAutomatica Sinica,2020,7(3):683-692.
[3]MA C,DAI G,ZHOU J.Short-term traffic flow prediction for urban road sections based on time series analysis and LSTM_BILSTM method[J].IEEE Transactions on Intelligent Transportation Systems,2021,23(6):5615-5624.
[4]SHARMA R R,KUMAR M,MAHESHWARI S,et al.EVDHM-ARIMA-based time series forecasting model and its application for COVID-19 cases[J].IEEE Transactions on Instrumentation and Measurement,2020,70:1-10.
[5]AMJADY N.Short-term hourly load forecasting using time-series modeling with peak load estimation capability[J].IEEE Transactions on Power Systems,2001,16(3):498-505.
[6]BOX G E P,JENKINS G M,REINSEL G C,et al.Time series analysis:forecasting and control[M].John Wiley & Sons,2015.
[7]BOX G E P,PIERCE D A.Distribution of residual autocorrelations in autoregressive-integrated moving average time series models[J].Journal of the American statistical Association,1970,65(332):1509-1526.
[8]XIE Y,JIN M,ZOU Z,et al.Real-time prediction of docker container resource load based on a hybrid model of ARIMA and triple exponential smoothing[J].IEEE Transactions on Cloud Computing,2020,10(2):1386-1401.
[9]LAI G,CHANG W C,YANG Y,et al.Modeling long-and short-term temporal patterns with deep neural networks[C]//The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.2018:95-104.
[10]VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of the 31st International Confe-rence on Neural Information Processing Systems.2017:6000-6010.
[11]GALASSI A,LIPPI M,TORRONI P.Attention in natural language processing[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(10):4291-4308.
[12]ZHANG T,GONG X,CHEN C L P.BMT-Net:Broad multitask transformer network for sentiment analysis[J].IEEE Transactions on Cybernetics,2021,52(7):6232-6243.
[13]HU Y,JIN X,ZHANG Y,et al.Rams-trans:Recurrent attention multi-scale transformer for fine-grained image recognition[C]//Proceedings of the 29th ACM International Conference on Multimedia.2021:4239-4248.
[14]CHEN H,WANG Y,GUO T,et al.Pre-trained image processing transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2021:12299-12310.
[15]MILLER J A,ALDOSARI M,SAEED F,et al.A survey of deep learning and foundation models for time series forecasting[J].arXiv:2401.13912,2024.
[16]LIM B,ZOHREN S.Time-series forecasting with deep learning:a survey[J].Philosophical Transactions of the Royal Society A,2021,379(2194):20200209.
[17]HAN Z,ZHAO J,LEUNG H,et al.A review of deep learning models for time series prediction[J].IEEE Sensors Journal,2019,21(6):7833-7848.
[18]YIN J,RAO W,YUANM,et al.Experimental study of multivariate time series forecasting models[C]//Proceedings of the 28th ACM International Conference on Information and Know-ledge Management.2019:2833-2839.
[19]AK R,FINK O,ZIO E.Two machine learning approaches forshort-term wind speed time-series prediction[J].IEEE Transactions on Neural Networks and Learning Systems,2015,27(8):1734-1747.
[20]LINDEMANN B,MÜLLER T,VIETZ H,et al.A survey on long short-term memory networks for time series prediction[J].Procedia Cirp,2021,99:650-655.
[21]CHENG C,SA-NGASOONGSONG A,BEYCA O,et al.Timeseries forecasting for nonlinear and non-stationary processes:a review and comparative study[J].Iie Transactions,2015,47(10):1053-1071.
[22]ARIK S O,YODER N C,PFISTER T.Self-adaptive forecasting for improved deep learning on non-stationary time-series[J].arXiv:2202.02403,2022.
[23]WANG Z,BOVIK A C.Mean squared error:Love it or leave it? A new look at signal fidelity measures[J].IEEE Signal Proces-sing Magazine,2009,26(1):98-117.
[24]WILLMOTT C J,MATSUURA K.Advantages of the mean absolute error(MAE) over the root mean square error(RMSE) in assessing average model performance[J].Climate Research,2005,30(1):79-82.
[25]CHAI T,DRAXLER R R.Root mean square error(RMSE) ormean absolute error(MAE)?-Arguments against avoiding RMSE in the literature[J].Geoscientific Model Development,2014,7(3):1247-1250.
[26]CORTES C,VAPNIK V.Support-vector networks[J].Machine learning,1995,20:273-297.
[27]BREIMAN L.Random forests[J].Machine Learning,2001,45:5-32.
[28]LIANG H T,LIU S,DU J W,et al.A review of deep learning applications in time series forecasting[J].Journal of Frontiers of Computer Science & Technology,2023,17(6):1285.
[29]MEDSKER L R,JAIN L C.Recurrent neural networks:design and applications[M].CRC press,1999.
[30]LIN S,LIN W,WU W,et al.Segrnn:Segment recurrent neural network for long-term time series forecasting[J].arXiv:2308.11200,2023.
[31]LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-basedlearning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[32]LIVIERIS I E,PINTELAS E,PINTELASP.A CNN-LSTMmodel for gold price time-series forecasting[J].Neural Computing and Applications,2020,32:17351-17360.
[33]GRAVES A.Long Short-term Memory[J].Supervised Sequence Labelling with Recurrent Neural Networks,2012,385:37-45.
[34]WEN Q,ZHOU T,ZHANG C,et al.Transformers in time series:A survey[J].arXiv:2202.07125,2022.
[35]TAO C,GAO S,SHANG M,et al.Get The Point of My Utte-rance! Learning Towards Effective Responses with Multi-Head Attention Mechanism[C]//IJCAI.2018:4418-4424.
[36]BENIDIS K,RANGAPURAM SS,FLUNKERT V,et al.Deep learning for time series forecasting:Tutorial and literature survey[J].ACM Computing Surveys,2022,55(6):1-36.
[37]WU S,XIAO X,DING Q,et al.Adversarial sparse transformer for time series forecasting[J].Advances in Neural Information Processing Systems,2020,33:17105-17115.
[38]LIM B,ARIK S Ö,LOEFF N,et al.Temporal fusion transformers for interpretable multi-horizon time series forecasting[J].International Journal of Forecasting,2021,37(4):1748-1764.
[39]ZHOU H,ZHANG S,PENG J,et al.Informer:Beyond efficient transformer for long sequence time-series forecasting[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:11106-11115.
[40]WU H,XU J,WANG J,et al.Autoformer:Decomposition transformers with auto-correlation for long-term series forecasting[J].Advances in Neural Information Processing Systems,2021,34:22419-22430.
[41]ZENG A,CHEN M,ZHANG L,et al.Are transformers effective for time series forecasting?[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2023:11121-11128.
[42]NIE Y,NGUYEN N H,SINTHONG P,et al.A time series is worth 64 words:Long-term forecasting with transformers[J].arXiv:2211.14730,2022.
[43]QI X,HOU K,LIU T,et al.From known to unknown:Know-ledge-guided transformer for time-series sales forecasting in ali-baba[J].arXiv:2109.08381,2021.
[44]ZHOU T,MA Z,WEN Q,et al.Fedformer:Frequency enhanced decomposed transformer for long-term series forecasting[C]//International Conference on Machine Learning.PMLR,2022:27268-27286.
[45]ZHANG Y,YAN J.Crossformer:Transformer utilizing cross-dimension dependency for multivariate time series forecasting[C]//The Eleventh International Conference on Learning Representations.2022.
[46]LIU S,YU H,LIAO C,et al.Pyraformer:Low-complexity pyramidal attention for long-range time series modeling and forecasting[C]//International Conference on Learning Representations.2021.
[47]YEH C,CHEN Y,WU A,et al.Attentionviz:A global view of transformer attention[J].IEEE Transactions on Visualization and Computer Graphics,2024,30(1):262-272.
[48]BRAŞOVEANU A M P,ANDONIE R.Visualizing transformers fornlp:a brief survey[C]//2020 24th International Conference Information Visualisation(IV).IEEE,2020:270-279.
[49]CORNIA M,BARALDI L,CUCCHIARA R.Explaining transformer-based image captioning models:An empirical analysis[J].AI Communications,2022,35(2):111-129.
[50]CHEN H,JIANG D,SAHLI H.Transformer encoder withmulti-modal multi-head attention for continuous affect recognition[J].IEEE Transactions on Multimedia,2020,23:4171-4183.
[51]LIU Z,RODRIGUEZ-OPAZO C,TENEY D,et al.Image re-trieval on real-life images with pre-trained vision-and-language models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021:2125-2134.
[52]RADFORD A,NARASIMHAN K,SALIMANS T,et al.Improving language understanding by generative pre-training[J/OL].https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
[53]JIN M,WANG S,MA L,et al.Time-llm:Time series forecasting by reprogramming large language models[J].arXiv:2310.01728,2023.
[54]ZHOU T,NIU P,SUN L,et al.One fits all:Power general time series analysis by pretrained lm[J].arXiv:2302.11939,2023.
[55]HE Y,DONG X,KANG G,et al.Asymptotic soft filter pruning for deep convolutional neural networks[J].IEEE Transactions on Cybernetics,2019,50(8):3594-3604.
[56]HE H,CAI J,LIU J,et al.Pruning self-attentions into convolutional layers in single path[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2024,46(5):13.
[57]DING Z,FU Y.Dual low-rank decompositions for robust cross-view learning[J].IEEE Transactions on Image Processing,2018,28(1):194-204.
[58]LIN S,LIN W,WUW,et al.SparseTSF:Modeling Long-term Time Series Forecasting with 1k Parameters[J].arXiv:2405.00946,2024.
[1] ZHOU Lei, SHI Huaifeng, YANG Kai, WANG Rui, LIU Chaofan. Intelligent Prediction of Network Traffic Based on Large Language Model [J]. Computer Science, 2025, 52(6A): 241100058-7.
[2] GUAN Xin, YANG Xueyong, YANG Xiaolin, MENG Xiangfu. Tumor Mutation Prediction Model of Lung Adenocarcinoma Based on Pathological [J]. Computer Science, 2025, 52(6A): 240700010-8.
[3] TAN Jiahui, WEN Chenyan, HUANG Wei, HU Kai. CT Image Segmentation of Intracranial Hemorrhage Based on ESC-TransUNet Network [J]. Computer Science, 2025, 52(6A): 240700030-9.
[4] CHEN Xianglong, LI Haijun. LST-ARBunet:An Improved Deep Learning Algorithm for Nodule Segmentation in Lung CT Images [J]. Computer Science, 2025, 52(6A): 240600020-10.
[5] RAN Qin, RUAN Xiaoli, XU Jing, LI Shaobo, HU Bingqi. Function Prediction of Therapeutic Peptides with Multi-coded Neural Networks Based on Projected Gradient Descent [J]. Computer Science, 2025, 52(6A): 240800024-6.
[6] ZOU Ling, ZHU Lei, DENG Yangjun, ZHANG Hongyan. Source Recording Device Verification Forensics of Digital Speech Based on End-to-End DeepLearning [J]. Computer Science, 2025, 52(6A): 240800028-7.
[7] LIU Runjun, XIAO Fengjun, HU Weitong, WANG Xu. Reversible Data Hiding in Fully Encrypted Images Based on Pixel Interval Partitioning andPrediction Recovery [J]. Computer Science, 2025, 52(6A): 240900030-8.
[8] CHEN Shijia, YE Jianyuan, GONG Xuan, ZENG Kang, NI Pengcheng. Aircraft Landing Gear Safety Pin Detection Algorithm Based on Improved YOlOv5s [J]. Computer Science, 2025, 52(6A): 240400189-7.
[9] ZENG Fanyun, LIAN Hechun, FENG Shanshan, WANG Qingmei. Material SEM Image Retrieval Method Based on Multi-scale Features and Enhanced HybridAttention Mechanism [J]. Computer Science, 2025, 52(6A): 240800014-7.
[10] GAO Junyi, ZHANG Wei, LI Zelin. YOLO-BFEPS:Efficient Attention-enhanced Cross-scale YOLOv10 Fire Detection Model [J]. Computer Science, 2025, 52(6A): 240800134-9.
[11] ZHANG Hang, WEI Shoulin, YIN Jibin. TalentDepth:A Monocular Depth Estimation Model for Complex Weather Scenarios Based onMultiscale Attention Mechanism [J]. Computer Science, 2025, 52(6A): 240900126-7.
[12] HOU Zhexiao, LI Bicheng, CAI Bingyan, XU Yifei. High Quality Image Generation Method Based on Improved Diffusion Model [J]. Computer Science, 2025, 52(6A): 240500094-9.
[13] DING Xuxing, ZHOU Xueding, QIAN Qiang, REN Yueyue, FENG Youhong. High-precision and Real-time Detection Algorithm for Photovoltaic Glass Edge Defects Based onFeature Reuse and Cheap Operation [J]. Computer Science, 2025, 52(6A): 240400146-10.
[14] HUANG Hong, SU Han, MIN Peng. Small Target Detection Algorithm in UAV Images Integrating Multi-scale Features [J]. Computer Science, 2025, 52(6A): 240700097-5.
[15] WANG Rong , ZOU Shuping, HAO Pengfei, GUO Jiawei, SHU Peng. Sand Dust Image Enhancement Method Based on Multi-cascaded Attention Interaction [J]. Computer Science, 2025, 52(6A): 240800048-7.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!