计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 35-40.doi: 10.11896/j.issn.1002-137X.2019.06.004
宋晓祥, 郭艳, 李宁, 王萌
SONG Xiao-xiang, GUO Yan, LI Ning, WANG Meng
摘要: 数据缺失在时间序列采集过程中频繁发生,已经严重阻碍了精确的数据分析。然而,现有的缺失数据预测算法多是从采集到的数据中发现某种规律,从而预测缺失的数据,并不适用于缺失数据较多的情况。基于此,提出了一种基于压缩感知的缺失数据预测算法。首先,该算法利用时间序列的时域平滑特性设计稀疏表示基,从而将缺失数据预测问题转化成稀疏向量恢复问题。其次,根据未缺失数据的位置特点设计了与稀疏表示基相关性低的观测矩阵,从而保证了算法的重构性能。仿真结果表明,即使数据缺失率高达90%,所提方法依然可以非常有效地预测出缺失数据。
中图分类号:
[1]SHI W,ZHU Y,ZHANG J,et al. Improving Power Grid Monitoring Data Quality:An Efficient Machine Learning Framework for Missing Data Prediction [C]∥IEEE International Con-ference on High Performance Computing and Communications.IEEE,2015:417-422. [2]BATINI C,CAPPIELLO C,FRANCALANCI C,et al. Methodo-logies for data quality assessment and improvement [J].Acm Computing Surveys,2009,41(3):1-52. [3]LUEBBERS D,GRIMMER U,JARKE M.Systematic Development of Data Mining-Based Data Quality Tools[C]∥Procee-dings of the 29th VLDB Conference.Morgan Kaufmann:San Francisco,2003:548-559. [4]WU S F,CHANG C Y,LEE S J.Time series forecasting with missing values[C]∥2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom).2015:151-156. [5]BALOUJI E,SALOR Q,ERMIS M.Exponential smoothing of multiple reference frame components with GPUs for real-time detection of time-varying harmonics and interharmonics of EAF currents [C]∥IEEE Industry Applications Society Meeting.IEEE,2017:1-8. [6]KOZERA R,WILKOLAZKA M.Natural spline interpolation and exponential parameterization for length estimation of curves [C]∥International Conference of Numerical Analysis & Applied Mathematics.AIP Publishing LLC,2017:1-140. [7]JUNNINEN H,NISKA H,TUPPURAINEN K,et al.Methods for imputation of missing values in air quality data sets[J].Atmospheric Environment,2004,38(18):2895-2907. [8]HONG S T,CHANG J W.A New Data Filtering Scheme Based on Statistical Data Analysis for Monitoring Systems in Wireless Sensor Networks[C]∥IEEE International Conference on High Performance Computing and Communications.IEEE,2011:635-640. [9]FUNG D S.Methods for the estimation of missing values in time series[J/OL].Theses Doctoratos & Masters,2006.http://ro.ecu.edu.au/theses/63. [10]LAO W,WANG Y,PENG C,et al.Time series forecasting via weighted combination of trend and seasonality respectively with linearly declining increments and multiple sine functions[C]∥2014 International Joint Conference on Neural Networks (IJCNN).2014:832-837. [11]NEWSHAM G R,BIRT B J.Building-level occupancy data to improve arima-based electricity use forecasts[C]∥Proceedings of the 2nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building.ACM,New York,USA,2010:13-18. [12]SHI W,ZHU Y,ZHANG J,et al.Improving power grid monitoring data quality:An efficient machine learning framework for missing data prediction[C]∥2015 IEEE 17th International Conference on High Performance Computing and Communications.IEEE,2015:417-422. [13]WEI G,KUN N,MAN C,et al.A data prediction algorithm based on BP neural network in telecom industry[C]∥2011 International Conference on Computer Science and Service System (CSSS).2011. [14]LI L,LI Y,LI Z.Efficient missing data imputing for traffic flow by considering temporal and spatial dependence [J].Transportation Research Part C,2013,34(9):108-120. [15]QU L,LI L,ZHANG Y,et al.PPCA-based missing data imputation for traffic flow volume:a systematical approach[J].IEEE Transactions on Intelligent Transportation Systems,2009,10(3):512-522. [16]SHI W,ZHU Y,YU P,et al.Effective Prediction of Missing Data on Apache Spark over Multivariable Time Series[J].IEEE Transactions on Big Data,2017,PP(99):1. [17]CAI Y,TONG H,FAN W,et al. Fast mining of a network of coevolving time series[C]∥The 2015 SIAM International Conference on Data Mining.2015:298-306. [18]FONOLLOSA J,SHEIK S,HUERTA R,et al. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring[J].Sensors & Actuators,2015,215:618-629. [19]RHEE I,SHIN M.Mobility traces[OL].http://carwdad.org/ncsu/mobilitymodels. [20]WU X,LIU M.In-situ soil moisture sensing:Measurement scheduling and estimation using compressive sensing [C]∥Proceedings of the 11th ACM International Conference on Information Processing in Sensor Networks.IEEE,2012:1-12. [21]CHEN S S,DONOHO D L,SAUNDERS M A.Atomic decomposition by basis pursuit[J].SIAM Review,2001,43(1):129-159. [22]TROPP J A,GILBERT A C.Signal recovery from random measurements via orthogonal matching pursuit[J].IEEETransactions Information Theory,2007,53(12):4655-4666. [23]ZHANG Z,RAO B D.Sparse Signal Recovery With Temporally Correlated Source Vectors Using Sparse Bayesian Learning [J].IEEE Journal of Selected Topics in Signal Processing,2011,5(5):912-926. [24]Al-SHOUKAIRI M,SCHNITER P,RAO B D.A GAMP Based Low Complexity Sparse Bayesian Learning Algorithm [J].IEEE Transactions on Signal Processing,2018,66(2):294-308. |
[1] | 高振卓, 王志海, 刘海洋. 嵌入典型时间序列特征的随机Shapelet森林算法 Random Shapelet Forest Algorithm Embedded with Canonical Time Series Features 计算机科学, 2022, 49(7): 40-49. https://doi.org/10.11896/jsjkx.210700226 |
[2] | 刘宝宝, 杨菁菁, 陶露, 王贺应. 基于DE-LSTM模型的教育统计数据预测研究 Study on Prediction of Educational Statistical Data Based on DE-LSTM Model 计算机科学, 2022, 49(6A): 261-266. https://doi.org/10.11896/jsjkx.220300120 |
[3] | 高堰泸, 徐圆, 朱群雄. 基于A-DLSTM夹层网络结构的电能消耗预测方法 Predicting Electric Energy Consumption Using Sandwich Structure of Attention in Double -LSTM 计算机科学, 2022, 49(3): 269-275. https://doi.org/10.11896/jsjkx.210100006 |
[4] | 张争万, 吴迪, 张春炯. 基于多通道稀疏LSTM的蜂窝流量预测研究 Study of Cellular Traffic Prediction Based on Multi-channel Sparse LSTM 计算机科学, 2021, 48(6): 296-300. https://doi.org/10.11896/jsjkx.210400134 |
[5] | 黄铭, 孙林夫, 任春华, 吴奇石. 改进KNN的时间序列分析方法 Improved KNN Time Series Analysis Method 计算机科学, 2021, 48(6): 71-78. https://doi.org/10.11896/jsjkx.200500044 |
[6] | 李培冠, 於志勇, 黄昉菀. 基于稀疏表示的电力负荷数据补全 Power Load Data Completion Based on Sparse Representation 计算机科学, 2021, 48(2): 128-133. https://doi.org/10.11896/jsjkx.191200152 |
[7] | 李艾玲, 张凤荔, 高强, 王瑞锦. 基于自适应时间戳与多尺度特征提取的轨迹下一足迹预测模型 Trajectory Next Footprint Prediction Model Based on Adaptive Timestamp and Multi-scale Feature Extraction 计算机科学, 2021, 48(11A): 191-197. https://doi.org/10.11896/jsjkx.201200015 |
[8] | 王晓迪, 刘鑫, 于晓. 用于多元时间序列预测的自适应频域模型 Adaptive Frequency Domain Model for Multivariate Time Series Forecasting 计算机科学, 2021, 48(11A): 204-210. https://doi.org/10.11896/jsjkx.210500129 |
[9] | 王新平, 夏春明, 颜建军. 基于肌音信号图像化和卷积神经网络的手语识别研究 Sign Language Recognition Based on Image-interpreted Mechanomyography and Convolution Neural Network 计算机科学, 2021, 48(11): 242-249. https://doi.org/10.11896/jsjkx.201000019 |
[10] | 刘玉红,刘树英,付福祥. 基于卷积神经网络的压缩感知重构算法优化 Optimization of Compressed Sensing Reconstruction Algorithms Based on Convolutional Neural Network 计算机科学, 2020, 47(3): 143-148. https://doi.org/10.11896/jsjkx.190100199 |
[11] | 田伟, 刘浩, 陈根龙, 宫晓蕙. 面向分块压缩感知的交叉子集导引自适应观测 Cross Subset-guided Adaptive Measurement for Block Compressive Sensing 计算机科学, 2020, 47(12): 190-196. https://doi.org/10.11896/jsjkx.200800197 |
[12] | 吴学林, 朱荣, 郭迎. 基于块稀疏贝叶斯模型的鬼成像重构算法 Ghost Imaging Reconstruction Algorithm Based on Block Sparse Bayesian Model 计算机科学, 2020, 47(11A): 188-191. https://doi.org/10.11896/jsjkx.200200058 |
[13] | 丁武, 马媛, 杜诗蕾, 李海辰, 丁公博, 王超. 基于XGBoost算法的多元水文时间序列趋势相似性挖掘 Mining Trend Similarity of Multivariate Hydrological Time Series Based on XGBoost Algorithm 计算机科学, 2020, 47(11A): 459-463. https://doi.org/10.11896/jsjkx.200500128 |
[14] | 闫祥祥. 使用ARIMA模型预测公园绿地面积 Using ARIMA Model to Predict Green Area of Park 计算机科学, 2020, 47(11A): 531-534. https://doi.org/10.11896/jsjkx.200300099 |
[15] | 许锋, 孙洁, 刘世杰. 基于遗传算法的声场重构测量优化方法 Sampling Optimization Method for Acoustic Field Reconstruction Based on Genetic Algorithm 计算机科学, 2020, 47(11): 304-309. https://doi.org/10.11896/jsjkx.200600167 |
|