计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 52-57.doi: 10.11896/jsjkx.220500277
蔡启铨1, 卢举鸿1, 於志勇1,2, 黄昉菀1,2
CAI Qiquan1, LU Juhong1, YU Zhiyong1,2, HUANG Fangwan1,2
摘要: 近年来,日益严重的空气污染正成为影响人们身体健康的危险因素之一。空气质量指数数据可以为政府提供大气环境变化的规律,也可以用于对大气污染的控制和管理。但该数据在采集的过程中不可避免地存在缺失,导致了对其进行数据挖掘的难度升高。为了更加充分地利用已经搜集到的数据,对缺失数据进行补全是非常必要的。然而,现有的补全方法往往在高缺失率情况下表现不佳。基于此提出将缺失矩阵补全问题转换为稀疏矩阵重构问题,并设计了一种基于多维稀疏表示的数据补全方法。该方法首先利用训练数据模拟各种随机缺失情况并用于过完备字典的学习,然后利用学习后字典的上半部分获得具有缺失值的矩阵的稀疏表示,最后将该稀疏表示与字典的下半部分相结合得到重构后的估计矩阵。实验结果表明,所提方法在多维时序空气质量指数数据补全问题上优于传统的矩阵补全方法,尤其是在数据缺失比较严重的情况下具有明显的优势。
中图分类号:
[1]WU R,SONG X,BAI Y,et al.Are current Chinese national ambient air quality standards on 24-hour averages for particulate matter sufficient to protect public health?[J].Journal of Environmental Sciences,2018,71:67-75. [2]World Health Organization.World health statistics 2019:monitoring health for the SDGs,sustainable development goals[M].World Health Organization,2019. [3]Ministry of Environmental Protection.Technical Regulation onAmbient Air Quality Index (on trial):HJ 633-2012[S].Beijing:China Environmental Science Press,2012. [4]KONG L,XIA M,LIU X Y,et al.Data loss and reconstruction in sensor networks[C]//2013 Proceedings IEEE INFOCOM.IEEE,2013:1654-1662. [5]MAZUMDER R,HASTIE T,TIBSHIRANI R.Spectral regu-larization algorithms for learning large incomplete matrices[J].The Journal of Machine Learning Research,2010,11:2287-2322. [6]YU H F,RAO N,DHILLON I S.Temporal regularized matrix factorization for high-dimensional time series prediction[J].Advances in Neural Information Processing Systems,2016,29:847-855. [7]GUO Y,SONG X,LI N,et al.An efficient missing data prediction method based on Kronecker compressive sensing in multivariable time series[J].IEEE Access,2018,6:57239-57248. [8]LIU X,WANG X,ZOU L,et al.Spatial imputation for air pollutants data sets via low rank matrix completion algorithm[J].Environment International,2020,139:105713. [9]SONG X,GUO Y,LI N,et al.A Novel Approach Based on Matrix Factorization for Recovering Missing Time Series Sensor Data[J].IEEE Sensors Journal,2020,20(22):13491-13500. [10]YUAN H,XU G,YAO Z,et al.Imputation of missing data in time series for air pollutants using long short-term memory recurrent neural networks[C]//Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Compu-ters.2018:1293-1300. [11]LI P G,YU Z Y,HUANG F W.Power load data completionbased on sparse representation[J].Computer Science,2021,48(2):128-133. [12]RAJ S,RAY K C.Sparse representation of ECG signals for automated recognition of cardiac arrhythmias[J].Expert Systems with Applications,2018,105:49-64. [13]HE G,DING K,LIN H.Fault feature extraction of rolling element bearings using sparse representation[J].Journal of Sound and Vibration,2016,366:514-527. [14]DU X,CHENG L,CHENG G.A heuristic search algorithm for the multiple measurement vectors problem[J].Signal Proces-sing,2014,100:1-8. [15]YANG J,MA J.Feed-forward neural network training usingsparse representation[J].Expert Systems with Applications,2019,116:255-264. [16]CHEN J,HUO X.Theoretical results on sparse representations of multiple-measurement vectors[J].IEEE Transactions on Signal Processing,2006,54(12):4634-4643. [17]RAO K R,YIP P.Discrete cosine transform:algorithms,advantages,applications[M].Academic Press,2014. [18]AHARON M,ELAD M,BRUCKSTEIN A.K-SVD:an algo-rithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Signal Processing,2006,54(11):4311-4322. [19]YE J C,KIM J M,BRESLER Y.Improving M-SBL for jointsparse recovery using a subspace penalty[J].IEEE Transactions on Signal Processing,2015,63(24):6595-6605. [20]WIPF D P,RAO B D.An empirical Bayesian strategy for solving the simultaneous sparse approximation problem[J].IEEE Transactions on Signal Processing,2007,55(7):3704-3716. [21]Beijing Municipal Ecological and Environmental MonitoringCenter [DB/OL].http://www.bjmemc.com.cn/. |
|