计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 82-88.doi: 10.11896/jsjkx.220600209

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于压缩感知的相关性数据填补方法

任兵, 郭艳, 李宁, 刘存涛   

  1. 陆军工程大学通信工程学院 南京 210007
  • 收稿日期:2022-06-23 修回日期:2022-07-20 出版日期:2023-07-15 发布日期:2023-07-05
  • 通讯作者: 郭艳(guoyan_1029@sina.com)
  • 作者简介:(renbing2872@sina.com)
  • 基金资助:
    国家自然科学基金(61871400);江苏省自然科学基金(BK20211227)

Method for Correlation Data Imputation Based on Compressed Sensing

REN Bing, GUO Yan, LI Ning, LIU Cuntao   

  1. College of Communication Engineering,Army Engineering University of PLA,Nanjing 210007,China
  • Received:2022-06-23 Revised:2022-07-20 Online:2023-07-15 Published:2023-07-05
  • About author:REN Bing,born in 1993,postgraduate.His main research interests include compressed sensing and big data.GUO Yan,born in 1971,Ph.D,professor.Her main research interests include unmanned intelligent system,compressed sensing and localization.
  • Supported by:
    National Natural Science Foundation of China(61871400) and Natural Science Foundation of Jiangsu Province,China(BK20211227).

摘要: 数据缺失现象在数据的采集和传输过程中经常发生,而对数据集中缺失数据的不当填补,会对后续的数据挖掘工作产生不利的影响。为了更有效地对缺失数据集进行填补,针对相关性数据,提出了一种基于压缩感知的缺失数据填补方法。首先,将缺失数据填补问题转化为压缩感知框架下的稀疏向量恢复问题;其次,针对数据的相关性特点构造了专门的稀疏表示基,从而能够更好地实现数据的稀疏化;最后,提出了一种快速迭代加权阈值算法,在传统的快速迭代收缩阈值算法的基础上引入了一种新的加权因子及重启动策略,提高了算法的收敛性能和数据的重构精度。仿真结果表明,所提算法能够高效地填补缺失数据,与传统的快速迭代收缩阈值算法相比,重构成功率和重构速度都得到了提升。同时,在数据稀疏变换效果较差的情况下,所提算法仍然能够完成对缺失数据集的填补,具有更好的鲁棒性。

关键词: 压缩感知, 数据填补, 相关性数据, 正交特征向量基, 迭代加权阈值法

Abstract: The phenomenon of missing data occurs frequently during the acquisition and transfer of data,and improper handling of missing data sets can adversely affect subsequent data mining efforts.In order to fill the missing data set more effectively,a method for data imputation based on compressed sensing is proposed for correlation data.First,the problem of missing data imputation is transformed into a sparse vector recovery problem under the compressed sensing framework.Second,a specialized sparse representation base is constructed for correlation data,so the data sparsity can be better realized.Finally,the fast iterative weighted thresholding algorithm(FIWTA) is proposed,which is refined based on the fast iterative shrinkage-thresholding algorithm (FISTA).The proposed algorithm adopts a new iterative weighted method and introduces a restart strategy,which greatly improves the convergence of the algorithm and the reconstruction accuracy of the data.Simulation results show that the proposed algorithm is able to fill the missing data efficiently,and both the reconstruction success rate and the reconstruction speed are improved compared with the traditional fast iterative shrinkage-thresholding algorithm.Meanwhile,even when the sparse transformation of the data is less effective,imputation of missing data sets can still be accomplished with better robustness.

Key words: Compressed sensing, Data imputation, Correlation data, Orthonormal eigenvector basis, Iterative weighted thresholding algorithm

中图分类号: 

  • TN911.7
[1]QIU X G,CHEN B,ZHANG P.Emergency Management Oriented Artificial Society Construction and Computational Experiments[M].Beijing:Science Press,2017:32-59.
[2]HE M.Introduction to big data-big data thinking and innovative applications[M].Beijing:Publishing House of Electronics Industry,2019:2-10.
[3]LIN W C,TSAI C F.Missing value imputation:a review andanalysis of the literature(2006-2017)[J].Artificial Intelligence Review,2020,53(2):1487-1509.
[4]HUANG G L.Missing data filling method based on linear interpolation and lightgbm[J].Journal of Physics:Conference Series,2021,1754(1):012187.
[5]SANJAR K,BEKHZOD O,KIM J,et al.Missing Data Imputation for Geolocation-based Price Prediction Using KNN-MCF Method[J].ISPRS International Journal of Geo-Information,2020,9(4):227.
[6]PRIETO-CUBIDES J,ARGOTY C.Dealing with Missing Data using a Selection Algorithm on Rough Sets[J].International Journal of Computational Intelligence Systems,2018,11(1):1307-1321.
[7]XIAO J Y,BULUT O.Evaluating the Performances of Missing Data Handling Methods in Ability Estimation From Sparse Data[J].Educational and Psychological Measurement,2020,80(5):932-954.
[8]KHALDY M A,KAMBHAMPATI C.Performance Analysis of Various Missing Value Imputation Methods on Heart Failure Dataset[C]//IntelliSys.Proceedings of SAI Intelligent Systems Conference.Berlin:Springer,2016:415-425.
[9]SAROJ A J,GUIN A,HUNTER M.Deep LSTM RecurrentNeural Networks for Arterial Traffic Volume Data Imputation[J].Journal of Big Data Analytics in Transportation,2021,3(2):95-108.
[10]CHEN X,SUN L.Bayesian Temporal Factorization for Multi-dimensional Time Series Prediction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2021,44(9):4659-4673.
[11]LU W Q,ZHOU T,LI L H,et al.An improved tucker decomposition-based imputation method for recovering lane-level missing values in traffic data[J].IET Intelligent Transport Systems,2022,16(3):363-379.
[12]BECK A,TEBOULLE M.A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems[J].Siam J Imaging Sciences,2009,2(1):183-202.
[13]PAN S,YAN K,LAN H,et al.Adaptive step-size fast iterative shrinkage-thresholding algorithm and sparse-spike deconvolution[J].Computers & Geosciences,2020,134:104343.
[14]CALATRONI L,CHAMBOLLE A.Backtracking strategies for accelerated descent methods with smooth composite objectives[J].SIAM Journal on Optimization,2017,29(3):1-25.
[15]ZHU T.Accelerating monotone fast iterative shrinkage-thres-holding algorithm with sequential subspace optimization for sparse recovery[J].Signal Image and Video Processing,2020,14(1):1-10.
[16]KIM D,PARK D.Element-Wise Adaptive Thresholds forLearned Iterative Shrinkage Thresholding Algorithms[J].IEEE Access,2020,4:45874-45886.
[17]TONG C,TENG Y,YAO Y,et al.Eigenvalue-free iterativeshrinkage-thresholding algorithm for solving the linear inverse problems[J].Inverse Problems,2021,37(6):5867-5877.
[18]CANDES E J,TAO T.Decoding by linear programming[J].IEEE Transactions Information Theory,2005,51(12):4203-4215.
[19]WU X,XIONG Y,YANG P,et al.Sparsest Random Scheduling for Compressive Data Gathering in Wireless Sensor Networks[J].IEEE Transactions on Wireless Communications,2014,13(10):5867-5877.
[20]QUER G,MASIERO R,MUNARETTO D,et al.On the interplay between routing and signal representation for Compressive Sensing in wireless sensor networks[C]//Information Theory &Applications Workshop.2009:206-215.
[21]ELAD M.Optimized projections for compressed sensing[J].IEEE Transactions on Signal Processing,2007,55(12):5695-5702.
[22]ZHU W X,HUANG Z L,CHEN J L,et al.Iteratively weighted thresholding homotopy method for the sparse solution of underdetermined linear equations[J].Science China Mathematics,2021,64(3):639-664.
[23]LI J J,JIANG Y,QIU T,et al.The Estimation Algorithm ofOFDM Sparse Channel Based on Compressed Sensing[J].Journal of Chongqing University of Technology (Natural Science),2021,35(4):117-122.
[24]DONOGHUE B,CANDES E.Adaptive restart for acceleratedgradient schemes[J].Foundations of Computational Mathema-tics,2015,15(3):715-732.
[25]YANG L,LI H,LI P,et al.Sparse Representation for SARGround Moving Target Imaging Based on Greedy FISTA[J].Journal of Signal Processing,2020,35(11):1844-1852.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!