计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 52-57.doi: 10.11896/jsjkx.220500277

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于多维稀疏表示的空气质量指数数据补全

蔡启铨1, 卢举鸿1, 於志勇1,2, 黄昉菀1,2   

  1. 1 福州大学计算机与大数据学院 福州 350108
    2 福建省网络计算与智能信息处理重点实验室(福州大学) 福州 350108
  • 收稿日期:2022-05-30 修回日期:2022-11-04 出版日期:2023-08-15 发布日期:2023-08-02
  • 通讯作者: 黄昉菀(hfw@fzu.edu.cn)
  • 作者简介:(2292639773@qq.com)
  • 基金资助:
    国家自然科学基金(61772136);福建省引导性项目(2020H0008);福建省中青年教师教育科研项目(JAT210007)

Data Completion of Air Quality Index Based on Multi-dimensional Sparse Representation

CAI Qiquan1, LU Juhong1, YU Zhiyong1,2, HUANG Fangwan1,2   

  1. 1 College of Computer and Data Science,Fuzhou University,Fuzhou 350108,China
    2 Fujian Key Laboratory of Network Computing and Intelligent Information Processing(Fuzhou University),Fuzhou 350108,China
  • Received:2022-05-30 Revised:2022-11-04 Online:2023-08-15 Published:2023-08-02
  • About author:CAI Qiquan,born in 1997,undergra-duate.His main research interests include data completion and so on.
    HUANG Fangwan,born in 1980,postgraduate,senior lecturer,is a member of China Computer Federation.Her main research interests include computa-tional intelligence,machine learning and big data analysis.
  • Supported by:
    National Natural Science Foundation of China(61772136),Fujian Provincial Guiding Project(2020H0008) and Educational Research Project for Young and Middle-aged Teachers in Fujian Province(JAT210007).

摘要: 近年来,日益严重的空气污染正成为影响人们身体健康的危险因素之一。空气质量指数数据可以为政府提供大气环境变化的规律,也可以用于对大气污染的控制和管理。但该数据在采集的过程中不可避免地存在缺失,导致了对其进行数据挖掘的难度升高。为了更加充分地利用已经搜集到的数据,对缺失数据进行补全是非常必要的。然而,现有的补全方法往往在高缺失率情况下表现不佳。基于此提出将缺失矩阵补全问题转换为稀疏矩阵重构问题,并设计了一种基于多维稀疏表示的数据补全方法。该方法首先利用训练数据模拟各种随机缺失情况并用于过完备字典的学习,然后利用学习后字典的上半部分获得具有缺失值的矩阵的稀疏表示,最后将该稀疏表示与字典的下半部分相结合得到重构后的估计矩阵。实验结果表明,所提方法在多维时序空气质量指数数据补全问题上优于传统的矩阵补全方法,尤其是在数据缺失比较严重的情况下具有明显的优势。

关键词: 空气质量指数, 缺失数据, 矩阵补全, 字典学习, 多维稀疏表示

Abstract: In recent years,air pollution has become increasingly serious and become one of the risk factors affecting people's health.The air quality index(AQI) can provide the government with the laws of atmospheric environment changes,and can also be used for air pollution control.However,the data is inevitably missing in the process of collection,which leads to the difficulty of data mining.However,given the poor performance of existing completion methods under a high miss rate,this paper transforms the missing-matrix-completion problem into a sparse-matrix-reconstruction problem and designs a data completion method based on multi-dimensional sparse representation.The method first uses the training data to simulate various random missing cases for over-complete dictionary learning.Then,the sparse representation of the matrix with missing values is obtained by using the upper part of the learned dictionary.Finally,the sparse representation is combined with the lower part of the dictionary to obtain the reconstructed estimation matrix.Experimental results show that the proposed algorithm is superior to the traditional matrix method in the completion of multi-dimensional time series of AQI,especially in the case of serious missing.

Key words: Air quality index, Missing data, Matrix completion, Dictionary learning, Multi-dimensional sparse representation

中图分类号: 

  • TP391
[1]WU R,SONG X,BAI Y,et al.Are current Chinese national ambient air quality standards on 24-hour averages for particulate matter sufficient to protect public health?[J].Journal of Environmental Sciences,2018,71:67-75.
[2]World Health Organization.World health statistics 2019:monitoring health for the SDGs,sustainable development goals[M].World Health Organization,2019.
[3]Ministry of Environmental Protection.Technical Regulation onAmbient Air Quality Index (on trial):HJ 633-2012[S].Beijing:China Environmental Science Press,2012.
[4]KONG L,XIA M,LIU X Y,et al.Data loss and reconstruction in sensor networks[C]//2013 Proceedings IEEE INFOCOM.IEEE,2013:1654-1662.
[5]MAZUMDER R,HASTIE T,TIBSHIRANI R.Spectral regu-larization algorithms for learning large incomplete matrices[J].The Journal of Machine Learning Research,2010,11:2287-2322.
[6]YU H F,RAO N,DHILLON I S.Temporal regularized matrix factorization for high-dimensional time series prediction[J].Advances in Neural Information Processing Systems,2016,29:847-855.
[7]GUO Y,SONG X,LI N,et al.An efficient missing data prediction method based on Kronecker compressive sensing in multivariable time series[J].IEEE Access,2018,6:57239-57248.
[8]LIU X,WANG X,ZOU L,et al.Spatial imputation for air pollutants data sets via low rank matrix completion algorithm[J].Environment International,2020,139:105713.
[9]SONG X,GUO Y,LI N,et al.A Novel Approach Based on Matrix Factorization for Recovering Missing Time Series Sensor Data[J].IEEE Sensors Journal,2020,20(22):13491-13500.
[10]YUAN H,XU G,YAO Z,et al.Imputation of missing data in time series for air pollutants using long short-term memory recurrent neural networks[C]//Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Compu-ters.2018:1293-1300.
[11]LI P G,YU Z Y,HUANG F W.Power load data completionbased on sparse representation[J].Computer Science,2021,48(2):128-133.
[12]RAJ S,RAY K C.Sparse representation of ECG signals for automated recognition of cardiac arrhythmias[J].Expert Systems with Applications,2018,105:49-64.
[13]HE G,DING K,LIN H.Fault feature extraction of rolling element bearings using sparse representation[J].Journal of Sound and Vibration,2016,366:514-527.
[14]DU X,CHENG L,CHENG G.A heuristic search algorithm for the multiple measurement vectors problem[J].Signal Proces-sing,2014,100:1-8.
[15]YANG J,MA J.Feed-forward neural network training usingsparse representation[J].Expert Systems with Applications,2019,116:255-264.
[16]CHEN J,HUO X.Theoretical results on sparse representations of multiple-measurement vectors[J].IEEE Transactions on Signal Processing,2006,54(12):4634-4643.
[17]RAO K R,YIP P.Discrete cosine transform:algorithms,advantages,applications[M].Academic Press,2014.
[18]AHARON M,ELAD M,BRUCKSTEIN A.K-SVD:an algo-rithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Signal Processing,2006,54(11):4311-4322.
[19]YE J C,KIM J M,BRESLER Y.Improving M-SBL for jointsparse recovery using a subspace penalty[J].IEEE Transactions on Signal Processing,2015,63(24):6595-6605.
[20]WIPF D P,RAO B D.An empirical Bayesian strategy for solving the simultaneous sparse approximation problem[J].IEEE Transactions on Signal Processing,2007,55(7):3704-3716.
[21]Beijing Municipal Ecological and Environmental MonitoringCenter [DB/OL].http://www.bjmemc.com.cn/.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!