计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 71-78.doi: 10.11896/jsjkx.200500044
黄铭1,2, 孙林夫1,2, 任春华1,2, 吴奇石1,3
HUANG Ming1,2, SUN Lin-fu1,2, REN Chun-hua1,2 , WU Qi-shi1,3
摘要: 近年来,随着数据挖掘和机器学习的兴起,基于时间序列分析方法的研究愈加丰富。作为机器学习的经典方法,KNN(K-Nearest Neighbor)因其简单、准确度高等特性被广泛应用于时间序列分析的各个领域。然而,使用原始的KNN回归方法预测时间序列具有一定的局限性,直接使用欧氏距离作为相似度度量方法的预测效果并不理想,无法适应具有整体趋势变化的时间序列的预测场景。文中提出一种拟合时间序列趋势的KNN算法TSTF-KNN(Time Series Trend Fitting KNN)算法,该方法通过对每个时刻的特征序列进行归一化处理,改进了KNN相似度度量的效果,使之可以更有效地搜索相似的特征序列。由于序列预测前进行了归一化,文中通过为预测结果添加误差项来还原序列特征,使之可以有效地预测结果。为了验证方法的有效性,从kaggle公开数据集中选取了4个数据集,并通过对这4个数据集分别进行预处理获得5个时间序列以供实验。通过使用TSTF-KNN、KNN、单层LSTM(Long Short-Term Memory)神经网络和ANN(Artificial Neural Network)在处理后的5个时间序列上进行预测实验,分析预测结果,并对比均方误差(Mean Square Error,MSE),验证了该方法的有效性。实验结果表明,该方法能有效提高KNN回归方法对时间序列预测的准确度和稳定性,使之可以更好地适应具有整体趋势变化的时间序列的预测场景。
中图分类号:
[1]DU Y L.Application and analysis of forecasting stock price index based on combination of ARIMA model and BP neural network [C]//2018 Chinese Control and Decision Conference(CCDC).2018:2854-2857. [2]BABU C N,REDDY B E.Predictive data mining on AverageGlobal Temperature using variants of ARIMA models [C]//IEEE-International Conference on Advances in Engineering,Science And Management(ICAESM-2012).2012:256-260. [3]LIU H,LI C X,SHAO Y Q,et al.Forecast of the trend in incidence of acute hemorrhagic conjunctivitis in China from 2011-2019 using the Seasonal Autoregressive Integrated Moving Average(SARIMA) and Exponential Smoothing(ETS) models [J].Journal of Infection and Public Health,2020,13(2):287-294. [4]ARAUJO C A G,CARVALHO F A T,MAIA A L S.Exponential smoothing methods for forecasting bar diagram-valued time series[C]//2012 IEEE International Conference on Systems,Man,and Cybernetics(SMC).2012:1361-1366. [5]BOX G E P,JENKINS G M,REINSEL G C,et al.Time Series Analysis:Forecasting and Control [M].Hoboken:John Wiley & Sons,2008:103-113. [6]SUNORI S K,JUNEJA P K,CHATURVEDI M,et al.ANN Modeling for Predicting Time Series[C]//2018 International Conference on Advances in Computing,Communication Control and Networking(ICACCCN).2018:792-794. [7]YADAV A,JHA C K,SHARAN A.Optimizing LSTM for time series prediction in Indian stock market [J].Procedia Computer Science,2020,167:2091-2100. [8]YANG Y J,YANG Y M,LI J P.Research on financial time series forecasting based on SVM[C]//2016 13th International Computer Conference on Wavelet Active Media Technology and Information Processing(ICCWAMTIP).2016:346-349. [9]FERNANDO F R,SIMON S R,JULIAN A F.Exchange-rateforecasts with simultaneous nearest-neighbour methods:evidence from the EMS[J].International Journal of Forecasting,1999,15(4):383-392. [10]HARRINGTON P.Machine Learning in Action [M].Greenwich:Manning Publications,2012:15-16. [11]ALTMAN N S.An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression [J].Taylor & Francis Group,2012,46(3):175-185. [12]ALPER T,GOZDE U.A RNN based time series approach for forecasting turkish electricity load[C]//2018 26th Signal Processing and Communications Applications Conference(SIU).2018:1-4. [13]LUDWIG S A.Comparison of Time Series Approaches applied to Greenhouse Gas Analysis:ANFIS,RNN,and LSTM[C]//2019 IEEE International Conference on Fuzzy Systems(FUZZ-IEEE).2019:1-6. [14]OLIVEIRA J F L,LUDERMIR T B.A hybrid evolutionary decomposition system for time series forecasting [J].Neurocomputing,2016,180:27-34. [15]HUANG H Y,LIU W X,DING Z H.Sales Forecasting Based on Multi-dimensional Grey Model and Neural Network [J].Journal of Software,2019,30(4):1031-1045. [16]ZHOU F.Marginal Electricity Price Forecasting Based on KNN-ANN Algorithm [J].Computer Engineering,2010,36(11):188-194. [17]HUANG N E,SHEN Z,LONG S R,et al.The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis [J].Proceedings of the Royal Society of London.Series A:Mathematical,Physical and Enginee-ring Sciences,1998,454(1971):903-995. [18]LV P,YUE L.Short-term wind speed forecasting based on non-stationary time series analysis and ARCH model [C]//2011 International Conference on Multimedia Technology.2011:2549-2553. [19]ABALOV N V,GUBAREV V V.Identification of time series based on methods of singular spectrum analysis and modeleteka [C]//2014 12th International Conference on Actual Problems of Electronics Instrument Engineering(APEIE).2014:643-647. [20]RAKTHANMANON T,CAMPANA B,MUEEN A,et al.Searching and mining trillions of time series subsequences under dynamic time warping [C]//International Conference on Knowledge Discovery and Data Mining.2012:262-270. [21]LI W H,CHENG J Y,XIE C Y.Prediction Method of Cyclic Time Series Based on DTW Similarity [J].Computer Science,2019,46(5):157-162. [22]YANG F Y,WANG B Y,CHEN Y,et al.K-nearest neighbor urban forecasting algorithm considering wind factors [J].Application Search of Computers,2019,36(6):1679-1682,1722. [23]LORA A T,SANTOS J M R,EXPOSITO A G,et al.Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques [J].IEEE Transactions on Power Systems,2007,22(3):1294-1301. [24]SUN B,MA L,CHENG W,et al.An improved k-nearest neighbours method for traffic time series imputation [C]//2017 Chinese Automation Congress(CAC).2017:7346-7351. [25]PARMEZAN A R S,BATISTA G E A P A.A Study of the Use of Complexity Measures in the Similarity Search Process Adop-ted by kNN Algorithm for Time Series Prediction [C]//2015 IEEE 14th International Conference on Machine Learning and Applications(ICMLA).2015:45-51. [26]FRANCISCO M,MARIA P F,MARIA D P,et al.Dealing with seasonality by narrowing the training set in time series forecasting with kNN [J].Expert Systems with Applications,2018,103:38-48. |
[1] | 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲. 基于无监督集群级的科技论文异质图节点表示学习方法 Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level 计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196 |
[2] | 黄丽, 朱焱, 李春平. 基于异构网络表征学习的作者学术行为预测 Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning 计算机科学, 2022, 49(9): 76-82. https://doi.org/10.11896/jsjkx.210900078 |
[3] | 王润安, 邹兆年. 基于物理操作级模型的查询执行时间预测方法 Query Performance Prediction Based on Physical Operation-level Models 计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074 |
[4] | 孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061 |
[5] | 汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188 |
[6] | 赵冬梅, 吴亚星, 张红斌. 基于IPSO-BiLSTM的网络安全态势预测 Network Security Situation Prediction Based on IPSO-BiLSTM 计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103 |
[7] | 帅剑波, 王金策, 黄飞虎, 彭舰. 基于神经架构搜索的点击率预测模型 Click-Through Rate Prediction Model Based on Neural Architecture Search 计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009 |
[8] | 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥. 视频理解中的动作质量评估方法综述 Survey on Action Quality Assessment Methods in Video Understanding 计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028 |
[9] | 杨啸, 王翔坤, 胡浩, 朱敏. 面向设备状态监测的可视化技术综述 Survey on Visualization Technology for Equipment Condition Monitoring 计算机科学, 2022, 49(7): 89-99. https://doi.org/10.11896/jsjkx.210900167 |
[10] | 王飞, 黄涛, 杨晔. 基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究 Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion 计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030 |
[11] | 王欣, 向明月, 李思颖, 赵若成. 基于隐马尔可夫模型的铁路出行团体关系预测研究 Relation Prediction for Railway Travelling Group Based on Hidden Markov Model 计算机科学, 2022, 49(6A): 247-255. https://doi.org/10.11896/jsjkx.210500001 |
[12] | 刘宝宝, 杨菁菁, 陶露, 王贺应. 基于DE-LSTM模型的教育统计数据预测研究 Study on Prediction of Educational Statistical Data Based on DE-LSTM Model 计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120 |
[13] | 朱旭辉, 沈国娇, 夏平凡, 倪志伟. 基于螺旋进化萤火虫算法和BP神经网络的模型及其在PPP融资风险预测中的应用 Model Based on Spirally Evolution Glowworm Swarm Optimization and Back Propagation Neural Network and Its Application in PPP Financing Risk Prediction 计算机科学, 2022, 49(6A): 667-674. https://doi.org/10.11896/jsjkx.210800088 |
[14] | 蔡欣雨, 冯翔, 虞慧群. 自适应权重的级联增强节点的宽度学习算法 Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes 计算机科学, 2022, 49(6): 134-141. https://doi.org/10.11896/jsjkx.210500119 |
[15] | 许杰, 祝玉坤, 邢春晓. 机器学习在金融资产定价中的应用研究综述 Application of Machine Learning in Financial Asset Pricing:A Review 计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127 |
|