计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 71-78.doi: 10.11896/jsjkx.200500044

• 数据库&大数据&数据科学 • 上一篇    下一篇

改进KNN的时间序列分析方法

黄铭1,2, 孙林夫1,2, 任春华1,2, 吴奇石1,3   

  1. 1 西南交通大学信息科学与技术学院 成都611756
    2 西南交通大学制造业产业链协同与信息化支撑技术四川省重点实验室 成都610031
    3 美国新泽西理工学院大数据中心 新泽西州 纽瓦克07102
  • 收稿日期:2020-05-12 修回日期:2020-08-14 出版日期:2021-06-15 发布日期:2021-06-03
  • 通讯作者: 孙林夫(sunlf@vip.163.com)
  • 基金资助:
    国家重点研发计划(2017YFB1401400,2017YFB1401401)

Improved KNN Time Series Analysis Method

HUANG Ming1,2, SUN Lin-fu1,2, REN Chun-hua1,2 , WU Qi-shi1,3   

  1. 1 School of Information Science and Technology,Southwestern Jiaotong University,Chengdu 611756,China
    2 Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratory of Sichuan Province,Southwestern Jiaotong University,Chengdu 610031,China
    3 Big Data Center,New Jersey Institute of Technology,Newark,State of New Jersey 07102,USA
  • Received:2020-05-12 Revised:2020-08-14 Online:2021-06-15 Published:2021-06-03
  • About author:HUANG Ming,born in 1996,master.His main research interests include machine learning,data mining and time series analysis.(1269662102@qq.com)
    SUN Lin-fu,born in 1963,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include cloud platform technology,manufacturing industry chain collaboration technology and manufacturing industry data mining.
  • Supported by:
    National Key Research and Development Program of China(2017YFB1401400,2017YFB1401401).

摘要: 近年来,随着数据挖掘和机器学习的兴起,基于时间序列分析方法的研究愈加丰富。作为机器学习的经典方法,KNN(K-Nearest Neighbor)因其简单、准确度高等特性被广泛应用于时间序列分析的各个领域。然而,使用原始的KNN回归方法预测时间序列具有一定的局限性,直接使用欧氏距离作为相似度度量方法的预测效果并不理想,无法适应具有整体趋势变化的时间序列的预测场景。文中提出一种拟合时间序列趋势的KNN算法TSTF-KNN(Time Series Trend Fitting KNN)算法,该方法通过对每个时刻的特征序列进行归一化处理,改进了KNN相似度度量的效果,使之可以更有效地搜索相似的特征序列。由于序列预测前进行了归一化,文中通过为预测结果添加误差项来还原序列特征,使之可以有效地预测结果。为了验证方法的有效性,从kaggle公开数据集中选取了4个数据集,并通过对这4个数据集分别进行预处理获得5个时间序列以供实验。通过使用TSTF-KNN、KNN、单层LSTM(Long Short-Term Memory)神经网络和ANN(Artificial Neural Network)在处理后的5个时间序列上进行预测实验,分析预测结果,并对比均方误差(Mean Square Error,MSE),验证了该方法的有效性。实验结果表明,该方法能有效提高KNN回归方法对时间序列预测的准确度和稳定性,使之可以更好地适应具有整体趋势变化的时间序列的预测场景。

关键词: KNN, 时间序列分析, 误差项, 相似度度量, 预测

Abstract: Recently,with the rise of data mining and machine learning,the research about time series analysis has become more and more abundant.As a classic method of machine learning,KNN(K-Nearest Neighbor)is widely used in various fields of time series analysis due to its simplicity and high prediction accuracy.However,the original KNN algorithm has some limitations in predicting time series.The prediction effect of directly using Euclidean distance as a measure of similarity is not ideal,and it cannot adapt to the prediction of time series with overall trends.This paper proposes an improved KNN algorithm named TSTF-KNN(Time Series Trend Fitting KNN).It improves the effect of KNN similarity measurement by normalizing the feature sequence at each moment,so that it can search for similar feature sequences more effectively.In addition,this paper adds error terms to the prediction result to adjust the prediction result so that it can predict the result more effectively.In order to verify the effectiveness of the method,this paper selects 4 public data sets from the kaggle public data sets,and preprocesses the 4 data sets to obtain 5 time series for the experiment.Then,this paper uses TSTF-KNN,KNN,single-layer LSTM(Long Short-Term Memory) neural network and ANN(Artificial Neural Network) to perform prediction experiments on 5 processed time series,analyze the prediction results,and compare the mean square error(MSE),which verifies the effectiveness of this method.Experimental results show that this method can effectively improve the accuracy and the stability of the KNN regression method for time series prediction,so that it can better adapt to the prediction scenarios of time series with overall trend changes.

Key words: Error terms, KNN, Prediction, Similarity measure, Time series analysis

中图分类号: 

  • TP391
[1]DU Y L.Application and analysis of forecasting stock price index based on combination of ARIMA model and BP neural network [C]//2018 Chinese Control and Decision Conference(CCDC).2018:2854-2857.
[2]BABU C N,REDDY B E.Predictive data mining on AverageGlobal Temperature using variants of ARIMA models [C]//IEEE-International Conference on Advances in Engineering,Science And Management(ICAESM-2012).2012:256-260.
[3]LIU H,LI C X,SHAO Y Q,et al.Forecast of the trend in incidence of acute hemorrhagic conjunctivitis in China from 2011-2019 using the Seasonal Autoregressive Integrated Moving Average(SARIMA) and Exponential Smoothing(ETS) models [J].Journal of Infection and Public Health,2020,13(2):287-294.
[4]ARAUJO C A G,CARVALHO F A T,MAIA A L S.Exponential smoothing methods for forecasting bar diagram-valued time series[C]//2012 IEEE International Conference on Systems,Man,and Cybernetics(SMC).2012:1361-1366.
[5]BOX G E P,JENKINS G M,REINSEL G C,et al.Time Series Analysis:Forecasting and Control [M].Hoboken:John Wiley & Sons,2008:103-113.
[6]SUNORI S K,JUNEJA P K,CHATURVEDI M,et al.ANN Modeling for Predicting Time Series[C]//2018 International Conference on Advances in Computing,Communication Control and Networking(ICACCCN).2018:792-794.
[7]YADAV A,JHA C K,SHARAN A.Optimizing LSTM for time series prediction in Indian stock market [J].Procedia Computer Science,2020,167:2091-2100.
[8]YANG Y J,YANG Y M,LI J P.Research on financial time series forecasting based on SVM[C]//2016 13th International Computer Conference on Wavelet Active Media Technology and Information Processing(ICCWAMTIP).2016:346-349.
[9]FERNANDO F R,SIMON S R,JULIAN A F.Exchange-rateforecasts with simultaneous nearest-neighbour methods:evidence from the EMS[J].International Journal of Forecasting,1999,15(4):383-392.
[10]HARRINGTON P.Machine Learning in Action [M].Greenwich:Manning Publications,2012:15-16.
[11]ALTMAN N S.An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression [J].Taylor & Francis Group,2012,46(3):175-185.
[12]ALPER T,GOZDE U.A RNN based time series approach for forecasting turkish electricity load[C]//2018 26th Signal Processing and Communications Applications Conference(SIU).2018:1-4.
[13]LUDWIG S A.Comparison of Time Series Approaches applied to Greenhouse Gas Analysis:ANFIS,RNN,and LSTM[C]//2019 IEEE International Conference on Fuzzy Systems(FUZZ-IEEE).2019:1-6.
[14]OLIVEIRA J F L,LUDERMIR T B.A hybrid evolutionary decomposition system for time series forecasting [J].Neurocomputing,2016,180:27-34.
[15]HUANG H Y,LIU W X,DING Z H.Sales Forecasting Based on Multi-dimensional Grey Model and Neural Network [J].Journal of Software,2019,30(4):1031-1045.
[16]ZHOU F.Marginal Electricity Price Forecasting Based on KNN-ANN Algorithm [J].Computer Engineering,2010,36(11):188-194.
[17]HUANG N E,SHEN Z,LONG S R,et al.The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis [J].Proceedings of the Royal Society of London.Series A:Mathematical,Physical and Enginee-ring Sciences,1998,454(1971):903-995.
[18]LV P,YUE L.Short-term wind speed forecasting based on non-stationary time series analysis and ARCH model [C]//2011 International Conference on Multimedia Technology.2011:2549-2553.
[19]ABALOV N V,GUBAREV V V.Identification of time series based on methods of singular spectrum analysis and modeleteka [C]//2014 12th International Conference on Actual Problems of Electronics Instrument Engineering(APEIE).2014:643-647.
[20]RAKTHANMANON T,CAMPANA B,MUEEN A,et al.Searching and mining trillions of time series subsequences under dynamic time warping [C]//International Conference on Knowledge Discovery and Data Mining.2012:262-270.
[21]LI W H,CHENG J Y,XIE C Y.Prediction Method of Cyclic Time Series Based on DTW Similarity [J].Computer Science,2019,46(5):157-162.
[22]YANG F Y,WANG B Y,CHEN Y,et al.K-nearest neighbor urban forecasting algorithm considering wind factors [J].Application Search of Computers,2019,36(6):1679-1682,1722.
[23]LORA A T,SANTOS J M R,EXPOSITO A G,et al.Electricity Market Price Forecasting Based on Weighted Nearest Neighbors Techniques [J].IEEE Transactions on Power Systems,2007,22(3):1294-1301.
[24]SUN B,MA L,CHENG W,et al.An improved k-nearest neighbours method for traffic time series imputation [C]//2017 Chinese Automation Congress(CAC).2017:7346-7351.
[25]PARMEZAN A R S,BATISTA G E A P A.A Study of the Use of Complexity Measures in the Similarity Search Process Adop-ted by kNN Algorithm for Time Series Prediction [C]//2015 IEEE 14th International Conference on Machine Learning and Applications(ICMLA).2015:45-51.
[26]FRANCISCO M,MARIA P F,MARIA D P,et al.Dealing with seasonality by narrowing the training set in time series forecasting with kNN [J].Expert Systems with Applications,2018,103:38-48.
[1] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[2] 黄丽, 朱焱, 李春平.
基于异构网络表征学习的作者学术行为预测
Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning
计算机科学, 2022, 49(9): 76-82. https://doi.org/10.11896/jsjkx.210900078
[3] 王润安, 邹兆年.
基于物理操作级模型的查询执行时间预测方法
Query Performance Prediction Based on Physical Operation-level Models
计算机科学, 2022, 49(8): 49-55. https://doi.org/10.11896/jsjkx.210700074
[4] 孙奇, 吉根林, 张杰.
基于非局部注意力生成对抗网络的视频异常事件检测方法
Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection
计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[5] 汪鸣, 彭舰, 黄飞虎.
基于多时间尺度时空图网络的交通流量预测模型
Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction
计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[6] 赵冬梅, 吴亚星, 张红斌.
基于IPSO-BiLSTM的网络安全态势预测
Network Security Situation Prediction Based on IPSO-BiLSTM
计算机科学, 2022, 49(7): 357-362. https://doi.org/10.11896/jsjkx.210900103
[7] 帅剑波, 王金策, 黄飞虎, 彭舰.
基于神经架构搜索的点击率预测模型
Click-Through Rate Prediction Model Based on Neural Architecture Search
计算机科学, 2022, 49(7): 10-17. https://doi.org/10.11896/jsjkx.210600009
[8] 张洪博, 董力嘉, 潘玉彪, 萧宗志, 张惠臻, 杜吉祥.
视频理解中的动作质量评估方法综述
Survey on Action Quality Assessment Methods in Video Understanding
计算机科学, 2022, 49(7): 79-88. https://doi.org/10.11896/jsjkx.210600028
[9] 杨啸, 王翔坤, 胡浩, 朱敏.
面向设备状态监测的可视化技术综述
Survey on Visualization Technology for Equipment Condition Monitoring
计算机科学, 2022, 49(7): 89-99. https://doi.org/10.11896/jsjkx.210900167
[10] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[11] 王欣, 向明月, 李思颖, 赵若成.
基于隐马尔可夫模型的铁路出行团体关系预测研究
Relation Prediction for Railway Travelling Group Based on Hidden Markov Model
计算机科学, 2022, 49(6A): 247-255. https://doi.org/10.11896/jsjkx.210500001
[12] 刘宝宝, 杨菁菁, 陶露, 王贺应.
基于DE-LSTM模型的教育统计数据预测研究
Study on Prediction of Educational Statistical Data Based on DE-LSTM Model
计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120
[13] 朱旭辉, 沈国娇, 夏平凡, 倪志伟.
基于螺旋进化萤火虫算法和BP神经网络的模型及其在PPP融资风险预测中的应用
Model Based on Spirally Evolution Glowworm Swarm Optimization and Back Propagation Neural Network and Its Application in PPP Financing Risk Prediction
计算机科学, 2022, 49(6A): 667-674. https://doi.org/10.11896/jsjkx.210800088
[14] 蔡欣雨, 冯翔, 虞慧群.
自适应权重的级联增强节点的宽度学习算法
Adaptive Weight Based Broad Learning Algorithm for Cascaded Enhanced Nodes
计算机科学, 2022, 49(6): 134-141. https://doi.org/10.11896/jsjkx.210500119
[15] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!