Computer Science ›› 2023, Vol. 50 ›› Issue (8): 52-57.doi: 10.11896/jsjkx.220500277

• Database & Big Data & Data Science • Previous Articles     Next Articles

Data Completion of Air Quality Index Based on Multi-dimensional Sparse Representation

CAI Qiquan1, LU Juhong1, YU Zhiyong1,2, HUANG Fangwan1,2   

  1. 1 College of Computer and Data Science,Fuzhou University,Fuzhou 350108,China
    2 Fujian Key Laboratory of Network Computing and Intelligent Information Processing(Fuzhou University),Fuzhou 350108,China
  • Received:2022-05-30 Revised:2022-11-04 Online:2023-08-15 Published:2023-08-02
  • About author:CAI Qiquan,born in 1997,undergra-duate.His main research interests include data completion and so on.
    HUANG Fangwan,born in 1980,postgraduate,senior lecturer,is a member of China Computer Federation.Her main research interests include computa-tional intelligence,machine learning and big data analysis.
  • Supported by:
    National Natural Science Foundation of China(61772136),Fujian Provincial Guiding Project(2020H0008) and Educational Research Project for Young and Middle-aged Teachers in Fujian Province(JAT210007).

Abstract: In recent years,air pollution has become increasingly serious and become one of the risk factors affecting people's health.The air quality index(AQI) can provide the government with the laws of atmospheric environment changes,and can also be used for air pollution control.However,the data is inevitably missing in the process of collection,which leads to the difficulty of data mining.However,given the poor performance of existing completion methods under a high miss rate,this paper transforms the missing-matrix-completion problem into a sparse-matrix-reconstruction problem and designs a data completion method based on multi-dimensional sparse representation.The method first uses the training data to simulate various random missing cases for over-complete dictionary learning.Then,the sparse representation of the matrix with missing values is obtained by using the upper part of the learned dictionary.Finally,the sparse representation is combined with the lower part of the dictionary to obtain the reconstructed estimation matrix.Experimental results show that the proposed algorithm is superior to the traditional matrix method in the completion of multi-dimensional time series of AQI,especially in the case of serious missing.

Key words: Air quality index, Missing data, Matrix completion, Dictionary learning, Multi-dimensional sparse representation

CLC Number: 

  • TP391
[1]WU R,SONG X,BAI Y,et al.Are current Chinese national ambient air quality standards on 24-hour averages for particulate matter sufficient to protect public health?[J].Journal of Environmental Sciences,2018,71:67-75.
[2]World Health Organization.World health statistics 2019:monitoring health for the SDGs,sustainable development goals[M].World Health Organization,2019.
[3]Ministry of Environmental Protection.Technical Regulation onAmbient Air Quality Index (on trial):HJ 633-2012[S].Beijing:China Environmental Science Press,2012.
[4]KONG L,XIA M,LIU X Y,et al.Data loss and reconstruction in sensor networks[C]//2013 Proceedings IEEE INFOCOM.IEEE,2013:1654-1662.
[5]MAZUMDER R,HASTIE T,TIBSHIRANI R.Spectral regu-larization algorithms for learning large incomplete matrices[J].The Journal of Machine Learning Research,2010,11:2287-2322.
[6]YU H F,RAO N,DHILLON I S.Temporal regularized matrix factorization for high-dimensional time series prediction[J].Advances in Neural Information Processing Systems,2016,29:847-855.
[7]GUO Y,SONG X,LI N,et al.An efficient missing data prediction method based on Kronecker compressive sensing in multivariable time series[J].IEEE Access,2018,6:57239-57248.
[8]LIU X,WANG X,ZOU L,et al.Spatial imputation for air pollutants data sets via low rank matrix completion algorithm[J].Environment International,2020,139:105713.
[9]SONG X,GUO Y,LI N,et al.A Novel Approach Based on Matrix Factorization for Recovering Missing Time Series Sensor Data[J].IEEE Sensors Journal,2020,20(22):13491-13500.
[10]YUAN H,XU G,YAO Z,et al.Imputation of missing data in time series for air pollutants using long short-term memory recurrent neural networks[C]//Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Compu-ters.2018:1293-1300.
[11]LI P G,YU Z Y,HUANG F W.Power load data completionbased on sparse representation[J].Computer Science,2021,48(2):128-133.
[12]RAJ S,RAY K C.Sparse representation of ECG signals for automated recognition of cardiac arrhythmias[J].Expert Systems with Applications,2018,105:49-64.
[13]HE G,DING K,LIN H.Fault feature extraction of rolling element bearings using sparse representation[J].Journal of Sound and Vibration,2016,366:514-527.
[14]DU X,CHENG L,CHENG G.A heuristic search algorithm for the multiple measurement vectors problem[J].Signal Proces-sing,2014,100:1-8.
[15]YANG J,MA J.Feed-forward neural network training usingsparse representation[J].Expert Systems with Applications,2019,116:255-264.
[16]CHEN J,HUO X.Theoretical results on sparse representations of multiple-measurement vectors[J].IEEE Transactions on Signal Processing,2006,54(12):4634-4643.
[17]RAO K R,YIP P.Discrete cosine transform:algorithms,advantages,applications[M].Academic Press,2014.
[18]AHARON M,ELAD M,BRUCKSTEIN A.K-SVD:an algo-rithm for designing overcomplete dictionaries for sparse representation[J].IEEE Transactions on Signal Processing,2006,54(11):4311-4322.
[19]YE J C,KIM J M,BRESLER Y.Improving M-SBL for jointsparse recovery using a subspace penalty[J].IEEE Transactions on Signal Processing,2015,63(24):6595-6605.
[20]WIPF D P,RAO B D.An empirical Bayesian strategy for solving the simultaneous sparse approximation problem[J].IEEE Transactions on Signal Processing,2007,55(7):3704-3716.
[21]Beijing Municipal Ecological and Environmental MonitoringCenter [DB/OL].http://www.bjmemc.com.cn/.
[1] QI Xiu-xiu, WANG Jia-hao, LI Wen-xiong, ZHOU Fan. Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning [J]. Computer Science, 2022, 49(7): 18-24.
[2] LI Pei-guan, YU Zhi-yong, HUANG Fang-wan. Power Load Data Completion Based on Sparse Representation [J]. Computer Science, 2021, 48(2): 128-133.
[3] ZHANG Fan, HE Wen-qi, JI Hong-bing, LI Dan-ping, WANG Lei. Multi-view Dictionary-pair Learning Based on Block-diagonal Representation [J]. Computer Science, 2021, 48(1): 233-240.
[4] TIAN Xu, CHANG Kan, HUANG Sheng, QIN Tuan-fa. Single Image Super-resolution Algorithm Using Residual Dictionary and Collaborative Representation [J]. Computer Science, 2020, 47(9): 135-141.
[5] ZHANG Wang-ce, FAN Jing, WANG Bo-ru and NI Min. (α,k)-anonymized Model for Missing Data [J]. Computer Science, 2020, 47(6A): 395-399.
[6] WANG Jun-hao, YAN De-qin, LIU De-shan, XING Yu-jia. Algorithm with Discriminative Analysis Dictionary Learning by Fusing Extreme Learning Machine [J]. Computer Science, 2020, 47(5): 137-143.
[7] QIAN Ling-long, WU Jiao, WANG Ren-feng, LU Hui-juan. Multi-document Automatic Summarization Based on Sparse Representation [J]. Computer Science, 2020, 47(11A): 97-105.
[8] SONG Xiao-xiang,GUO Yan,LI Ning,YU Dong-ping. Missing Data Prediction Algorithm Based on Sparse Bayesian Learning in Coevolving Time Series [J]. Computer Science, 2019, 46(7): 217-223.
[9] SONG Xiao-xiang, GUO Yan, LI Ning, WANG Meng. Missing Data Prediction Based on Compressive Sensing in Time Series [J]. Computer Science, 2019, 46(6): 35-40.
[10] DU Xiu-li, ZUO Si-ming, QIU Shao-ming. Adaptive Dictionary Learning Algorithm Based on Image Gray Entropy [J]. Computer Science, 2019, 46(5): 266-271.
[11] FAN Zhe-ning, YANG Qiu-hui, ZHAI Yu-peng, WAN Ying, WANG Shuai. Improved ROUSTIDA Algorithm for Missing Data Imputation with Key Attribute in Repetitive Data [J]. Computer Science, 2019, 46(2): 30-34.
[12] WU Chen, YUAN Yu-wei, WANG Hong-wei, LIU Yu, LIU Si-tong, QUAN Ji-cheng. Word Vectors Fusion Based Remote Sensing Scenes Zero-shot Classification Algorithm [J]. Computer Science, 2019, 46(12): 286-291.
[13] KUANG Shen-fen, HUANG Ye-wen, SONG Jie, LI Qia. Deep Matrix Factorization Network for Matrix Completion [J]. Computer Science, 2019, 46(10): 55-62.
[14] LV Ming-qi, LI Yi-fan, CHEN Tie-ming. Spatial Estimation Method of Air Quality Based on Terrain Factors LV Ming-qi LI Yi-fan CHEN Tie-ming [J]. Computer Science, 2019, 46(1): 265-270.
[15] ZHANG Zhen-zhen ,WANG Jian-lin. Dictionary Learning Image Denoising Algorithm Combining Second Generation Bandelet Transform Block [J]. Computer Science, 2018, 45(7): 264-270.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!