计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 35-40.doi: 10.11896/j.issn.1002-137X.2019.06.004

• 大数据与数据科学* • 上一篇    下一篇

基于压缩感知的时间序列缺失数据预测算法

宋晓祥, 郭艳, 李宁, 王萌   

  1. (陆军工程大学通信工程学院 南京210007)
  • 收稿日期:2018-04-18 发布日期:2019-06-24
  • 作者简介:宋晓祥(1993-),男,硕士生,主要研究方向为信号处理、大数据;郭 艳(1971-),女,博士,教授,博士生导师,主要研究方向为波束形成、认知无线电、无线传感器网络定位、自适应信号处理,E-mail:guoyan_1029@sina.com;李 宁(1967-),男,副教授,硕士生导师,主要研究方向为Ad hoc网络、无线认知网络;王 萌(1983-),男,硕士,主要研究方向为信号处理。
  • 基金资助:
    国家自然科学基金(61571463,61371124,61472445),江苏省自然科学基金(BK20171401)资助。

Missing Data Prediction Based on Compressive Sensing in Time Series

SONG Xiao-xiang, GUO Yan, LI Ning, WANG Meng   

  1. (College of Communications Engineering,Army Engineering University,Nanjing 210007,China)
  • Received:2018-04-18 Published:2019-06-24

摘要: 数据缺失在时间序列采集过程中频繁发生,已经严重阻碍了精确的数据分析。然而,现有的缺失数据预测算法多是从采集到的数据中发现某种规律,从而预测缺失的数据,并不适用于缺失数据较多的情况。基于此,提出了一种基于压缩感知的缺失数据预测算法。首先,该算法利用时间序列的时域平滑特性设计稀疏表示基,从而将缺失数据预测问题转化成稀疏向量恢复问题。其次,根据未缺失数据的位置特点设计了与稀疏表示基相关性低的观测矩阵,从而保证了算法的重构性能。仿真结果表明,即使数据缺失率高达90%,所提方法依然可以非常有效地预测出缺失数据。

关键词: 缺失数据, 时间序列, 压缩感知

Abstract: The frequent occurrence of data loss in time series acquisition processhas seriously hindered the accurate data analysis. However,most of the existing methods mainly find a certain pattern from the collected data to predict the missing data,which are only feasible to be applied to the case where only a low ratio of collected data are missing. In view of the problem above,this paper proposed an algorithm of missing data prediction based on compressive sensing. The missing data prediction problem is formulated as the multiple sparse vectors recovery problem. Firstly,the sparse representation basis is designed by making use of the temporal smoothness of time series,thus transforming the missing data prediction problem into the problem of the sparse vector recovery. Secondly,the observation matrix is designed based on the location characteristics of the data that are not missing,which is lowly coherent with the designed representation bases,thus ensuring the reconstruction performance of the proposed algorithm. The simulation results show that the proposed algorithm can predict the missing data very effectively even if the ratio of data loss is as high as 90%.

Key words: Compressive sensing, Missing data, Time series

中图分类号: 

  • TN911.7
[1]SHI W,ZHU Y,ZHANG J,et al. Improving Power Grid Monitoring Data Quality:An Efficient Machine Learning Framework for Missing Data Prediction [C]∥IEEE International Con-ference on High Performance Computing and Communications.IEEE,2015:417-422.
[2]BATINI C,CAPPIELLO C,FRANCALANCI C,et al. Methodo-logies for data quality assessment and improvement [J].Acm Computing Surveys,2009,41(3):1-52.
[3]LUEBBERS D,GRIMMER U,JARKE M.Systematic Development of Data Mining-Based Data Quality Tools[C]∥Procee-dings of the 29th VLDB Conference.Morgan Kaufmann:San Francisco,2003:548-559.
[4]WU S F,CHANG C Y,LEE S J.Time series forecasting with missing values[C]∥2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom).2015:151-156.
[5]BALOUJI E,SALOR Q,ERMIS M.Exponential smoothing of multiple reference frame components with GPUs for real-time detection of time-varying harmonics and interharmonics of EAF currents [C]∥IEEE Industry Applications Society Meeting.IEEE,2017:1-8.
[6]KOZERA R,WILKOLAZKA M.Natural spline interpolation and exponential parameterization for length estimation of curves [C]∥International Conference of Numerical Analysis & Applied Mathematics.AIP Publishing LLC,2017:1-140.
[7]JUNNINEN H,NISKA H,TUPPURAINEN K,et al.Methods for imputation of missing values in air quality data sets[J].Atmospheric Environment,2004,38(18):2895-2907.
[8]HONG S T,CHANG J W.A New Data Filtering Scheme Based on Statistical Data Analysis for Monitoring Systems in Wireless Sensor Networks[C]∥IEEE International Conference on High Performance Computing and Communications.IEEE,2011:635-640. [9]FUNG D S.Methods for the estimation of missing values in time series[J/OL].Theses Doctoratos & Masters,2006.http://ro.ecu.edu.au/theses/63.
[10]LAO W,WANG Y,PENG C,et al.Time series forecasting via weighted combination of trend and seasonality respectively with linearly declining increments and multiple sine functions[C]∥2014 International Joint Conference on Neural Networks (IJCNN).2014:832-837.
[11]NEWSHAM G R,BIRT B J.Building-level occupancy data to improve arima-based electricity use forecasts[C]∥Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building.ACM,New York,USA,2010:13-18.
[12]SHI W,ZHU Y,ZHANG J,et al.Improving power grid monitoring data quality:An efficient machine learning framework for missing data prediction[C]∥2015 IEEE 17th International Conference on High Performance Computing and Communications.IEEE,2015:417-422.
[13]WEI G,KUN N,MAN C,et al.A data prediction algorithm based on BP neural network in telecom industry[C]∥2011 International Conference on Computer Science and Service System (CSSS).2011.
[14]LI L,LI Y,LI Z.Efficient missing data imputing for traffic flow by considering temporal and spatial dependence [J].Transportation Research Part C,2013,34(9):108-120.
[15]QU L,LI L,ZHANG Y,et al.PPCA-based missing data imputation for traffic flow volume:a systematical approach[J].IEEE Transactions on Intelligent Transportation Systems,2009,10(3):512-522.
[16]SHI W,ZHU Y,YU P,et al.Effective Prediction of Missing Data on Apache Spark over Multivariable Time Series[J].IEEE Transactions on Big Data,2017,PP(99):1.
[17]CAI Y,TONG H,FAN W,et al. Fast mining of a network of coevolving time series[C]∥The 2015 SIAM International Conference on Data Mining.2015:298-306.
[18]FONOLLOSA J,SHEIK S,HUERTA R,et al. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring[J].Sensors & Actuators,2015,215:618-629.
[19]RHEE I,SHIN M.Mobility traces[OL].http://carwdad.org/ncsu/mobilitymodels.
[20]WU X,LIU M.In-situ soil moisture sensing:Measurement scheduling and estimation using compressive sensing [C]∥Proceedings of the 11th ACM International Conference on Information Processing in Sensor Networks.IEEE,2012:1-12.
[21]CHEN S S,DONOHO D L,SAUNDERS M A.Atomic decomposition by basis pursuit[J].SIAM Review,2001,43(1):129-159.
[22]TROPP J A,GILBERT A C.Signal recovery from random measurements via orthogonal matching pursuit[J].IEEETransactions Information Theory,2007,53(12):4655-4666.
[23]ZHANG Z,RAO B D.Sparse Signal Recovery With Temporally Correlated Source Vectors Using Sparse Bayesian Learning [J].IEEE Journal of Selected Topics in Signal Processing,2011,5(5):912-926.
[24]Al-SHOUKAIRI M,SCHNITER P,RAO B D.A GAMP Based Low Complexity Sparse Bayesian Learning Algorithm [J].IEEE Transactions on Signal Processing,2018,66(2):294-308.
[1] 高振卓, 王志海, 刘海洋.
嵌入典型时间序列特征的随机Shapelet森林算法
Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features
计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226
[2] 刘宝宝, 杨菁菁, 陶露, 王贺应.
基于DE-LSTM模型的教育统计数据预测研究
Study on Prediction of Educational Statistical Data Based on DE-LSTM Model
计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120
[3] 高堰泸, 徐圆, 朱群雄.
基于A-DLSTM夹层网络结构的电能消耗预测方法
Predicting Electric Energy Consumption Using Sandwich Structure of Attention in Double -LSTM
计算机科学, 2022, 49(3): 269-275. https://doi.org/10.11896/jsjkx.210100006
[4] 张争万, 吴迪, 张春炯.
基于多通道稀疏LSTM的蜂窝流量预测研究
Study of Cellular Traffic Prediction Based on Multi-channel Sparse LSTM
计算机科学, 2021, 48(6): 296-300. https://doi.org/10.11896/jsjkx.210400134
[5] 黄铭, 孙林夫, 任春华, 吴奇石.
改进KNN的时间序列分析方法
Improved KNN Time Series Analysis Method
计算机科学, 2021, 48(6): 71-78. https://doi.org/10.11896/jsjkx.200500044
[6] 李培冠, 於志勇, 黄昉菀.
基于稀疏表示的电力负荷数据补全
Power Load Data Completion Based on Sparse Representation
计算机科学, 2021, 48(2): 128-133. https://doi.org/10.11896/jsjkx.191200152
[7] 李艾玲, 张凤荔, 高强, 王瑞锦.
基于自适应时间戳与多尺度特征提取的轨迹下一足迹预测模型
Trajectory Next Footprint Prediction Model Based on Adaptive Timestamp and Multi-scale Feature Extraction
计算机科学, 2021, 48(11A): 191-197. https://doi.org/10.11896/jsjkx.201200015
[8] 王晓迪, 刘鑫, 于晓.
用于多元时间序列预测的自适应频域模型
Adaptive Frequency Domain Model for Multivariate Time Series Forecasting
计算机科学, 2021, 48(11A): 204-210. https://doi.org/10.11896/jsjkx.210500129
[9] 王新平, 夏春明, 颜建军.
基于肌音信号图像化和卷积神经网络的手语识别研究
Sign Language Recognition Based on Image-interpreted Mechanomyography and Convolution Neural Network
计算机科学, 2021, 48(11): 242-249. https://doi.org/10.11896/jsjkx.201000019
[10] 刘玉红,刘树英,付福祥.
基于卷积神经网络的压缩感知重构算法优化
Optimization of Compressed Sensing Reconstruction Algorithms Based on Convolutional Neural Network
计算机科学, 2020, 47(3): 143-148. https://doi.org/10.11896/jsjkx.190100199
[11] 田伟, 刘浩, 陈根龙, 宫晓蕙.
面向分块压缩感知的交叉子集导引自适应观测
Cross Subset-guided Adaptive Measurement for Block Compressive Sensing
计算机科学, 2020, 47(12): 190-196. https://doi.org/10.11896/jsjkx.200800197
[12] 吴学林, 朱荣, 郭迎.
基于块稀疏贝叶斯模型的鬼成像重构算法
Ghost Imaging Reconstruction Algorithm Based on Block Sparse Bayesian Model
计算机科学, 2020, 47(11A): 188-191. https://doi.org/10.11896/jsjkx.200200058
[13] 丁武, 马媛, 杜诗蕾, 李海辰, 丁公博, 王超.
基于XGBoost算法的多元水文时间序列趋势相似性挖掘
Mining Trend Similarity of Multivariate Hydrological Time Series Based on XGBoost Algorithm
计算机科学, 2020, 47(11A): 459-463. https://doi.org/10.11896/jsjkx.200500128
[14] 闫祥祥.
使用ARIMA模型预测公园绿地面积
Using ARIMA Model to Predict Green Area of Park
计算机科学, 2020, 47(11A): 531-534. https://doi.org/10.11896/jsjkx.200300099
[15] 许锋, 孙洁, 刘世杰.
基于遗传算法的声场重构测量优化方法
Sampling Optimization Method for Acoustic Field Reconstruction Based on Genetic Algorithm
计算机科学, 2020, 47(11): 304-309. https://doi.org/10.11896/jsjkx.200600167
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!