Computer Science ›› 2022, Vol. 49 ›› Issue (8): 56-63.doi: 10.11896/jsjkx.210600180

• Database & Big Data & Data Science • Previous Articles     Next Articles

RIIM:Real-Time Imputation Based on Individual Models

LI Xia, MA Qian, BAI Mei, WANG Xi-te, LI Guan-yu, NING Bo   

  1. School of Information Science & Technology,Dalian Maritime University,Dalian,Liaoning 116026,China
  • Received:2021-06-28 Revised:2021-10-19 Published:2022-08-02
  • About author:LI Xia,born in 1997,postgraduate,is a member of China Computer Federation.Her main research interests include data cleaning and sensory data management.
    MA Qian,born in 1988,Ph.D,is a member of China Computer Federation.Her main research interests include data cleaning and sensory data management.
  • Supported by:
    National Natural Science Foundation of China(62002039,61602076,61702072,61976032),China Postdoctoral Science Foundation Funded Projects(2017M611211,2017M621122,2019M661077),Natural Science Foundation of Liaoning Province(20180540003),CERNET Innovation Project(NGII20190902) and Fundamental Research Funds for Central Universities(3132021239).

Abstract: With the enrichment of data sources,data can be obtained easily but with low quality,resulting that the MVs are ubi-quitous and hard to avoid.Consequently,MV imputation has become one of the classical problems in the field of data quality mana-gement.However,most existing MV imputation approaches are proposed for static data,which cannot handle dynamic data streams arriving at high-speed.Moreover,they do not consider data sparsity and heterogeneity simultaneously.Therefore,a novel MV imputation approach,real-time imputation based on individual models (RIIM) is proposed.In RIIM,the MVs are effectively filled by combining the basic ideas of neighbors-based imputation and regression-based imputation with consideration of sparsity and heterogeneity of data.For the dynamic and real time of data streams,the MV imputation model is updated incrementally.Moreover,an adaptive and periodic updating strategy for optimal neighbors search is proposed to solve the problem of high time cost and hard to determine the number of neighbors.Finally,the effectiveness of the proposed RIIM is evaluated based on extensive experiments over real-world datasets.

Key words: Data streams, Heterogeneity, Missing value, Real-time imputation, Sparsity

CLC Number: 

  • TP3-05
[1]LI J Z,WANG H Z.Data quality,a new aspect of big data[N].Technology Daily,2015-06-23(7).
[2]YU G,GU Y.Large-scale graph data processing technology in cloud computing environment[J].Journal of Computer,2011,34(10):1753-1767.
[3]LI K L,LIU ALEX X,YU S.Special issue on natural computation fuzzy systems and knowledge discovery from the ICNC&FSKD 2017[J].Neurocomputing,2020,393:112-114.
[4]ALTMAN N S.An introduction to kernel and nearest-neighbor nonpara metric regression[J].The American Statistician,1992,46(3):175-185.
[5]BUTERA N M,LI S Y,EVENSON K R,et al.Hot Deck Multiple Imputation for Handling Missing Accelerometer Data[J].Statistics in Biosciences,2019,11(2):422-448.
[6]LITTTLE R J.Regression with missing x’s:a review[J].Journal of the American Statistical Association,1992,87(420):1227-1237.
[7]ZHANG A Q,SONG X X,SUN Y,et al.Learning IndividualModels for Imputation[C]//ICDE.2019:160-171.
[8]ZHANG C Q,ZHU X F,ZHANG J L,et al.GBKII:An imputation method for missing values[C]//Advances in Knowledge Discovery and Data Mining,11th Pacific-Asia Conference.2007.
[9]DOMENICONI C,YAN B.Nearest neighbor ensemble[C]//17thInternational Conference on Pattern Recognition.2004:228-231.
[10]CAI Z P,HEYDARI M,LIN G H.Microarray missing value imputation by iterated local least squares[C]//Proceedings of 4th Asia-Pacifific Bioinformatics Conference.2006:159-168.
[11]WANG Q H,RAO J N K.Empirical likelihood-based inference in linear models with missing data[J].Scandinavian Journal of Statistics,2002,29(3):563-576.
[12]RACINEA J,LI Q.Nonparametric estimation of regressionfunctions with both categorical and continuous data[J].Journal of Econometrics,2004,119(1):99-130.
[13]ZHU X,ZHANG S,JIN Z,et al.Missing Value Estimation for Mixed-Attribute Data Sets[J].IEEE Transactions on Know-ledge and Data Engineering,2011,23(1):110-121.
[14]CLEVEL W S,LOADER C.Smoothing by local regression:Principles and methods[J].Technical Report,1996,64(3):167-169.
[15]CHRIS M,JENNIFER N,SUNIL P.ERACER:a database approach for statistical inference and data cleaning[C]//Procee-dings of the 2010 ACM SIGMOD International Conference on Management of data.2010:75-86.
[16]KHAYATI M,BOHLEN M H.REBOM:recovery of blocks of missing values in time series[C]//COMAD.2012:44-55.
[17]YI B,SIDIROPOULOS N,JOHNSON T,et al.Online data mi-ning for co-evolving time sequences[C]//ICDE.2000:13-22.
[18]SUN J,PAPADIMITRIOU S,FALOUTSOS C.Online latentvariable detection in sensor networks[C]//ICDE.2005:1126-1127.
[19]BOX G E P,JENKINS G.Time Series Analysis,Forecasting and Control[M].Prentice Hall PTR,1994.
[20]KEVIN W M H.Continuous Imputation of Missing Values in Streams of Pattern-Determining Time Series[C]//20th International Conference on Extending Database Technology(EDBT).2017:2367-2005.
[21]VITO S D,MASSERA E,PIGA M,et al.On field calibration ofan electronic nose for benzene estimation in an urban pollution monitoring scenario[J].Sensors & Actuators B Chemical,2008,129(2):750-757.
[1] SUN Xiao-han, ZHANG Li. Collaborative Filtering Recommendation Algorithm Based on Rating Region Subspace [J]. Computer Science, 2022, 49(7): 50-56.
[2] WANG Mei-ling, LIU Xiao-nan, YIN Mei-juan, QIAO Meng, JING Li-na. Deep Learning Recommendation Algorithm Based on Reviews and Item Descriptions [J]. Computer Science, 2022, 49(3): 99-104.
[3] ZHOU Hai-yu, ZHANG Dao-qiang. Multi-site Hyper-graph Convolutional Neural Networks and Application [J]. Computer Science, 2022, 49(3): 129-133.
[4] MA Feng-fei, LIN Su-zhen, LIU Feng, WANG Li-fang, LI Da-wei. Semantic-contrast Generative Adversarial Network Based Highly Undersampled MRI Reconstruction [J]. Computer Science, 2021, 48(4): 169-173.
[5] XU Bing, YI Pei-yu, WANG Jin-ce, PENG Jian. High-order Collaborative Filtering Recommendation System Based on Knowledge Graph Embedding [J]. Computer Science, 2021, 48(11A): 244-250.
[6] LIU Xin-bin, WANG Li-zhen, ZHOU Li-hua. MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution [J]. Computer Science, 2021, 48(11): 208-218.
[7] SHAO Zheng-yi, CHEN Xiu-hong. Sample Feature Kernel Matrix-based Sparse Bilinear Regression [J]. Computer Science, 2021, 48(10): 185-190.
[8] LU Ai-hong, GUO Yan, LI Ning, WANG Meng, LIU Jie. Direction-of-arrival Estimation with Two-dimensional Sparse Array Based on Atomic NormMinimization [J]. Computer Science, 2020, 47(5): 271-276.
[9] ZHAO Nan, PI Wen-chao, XU Chang-qiao. Video Recommendation Algorithm for Multidimensional Feature Analysis and Filtering [J]. Computer Science, 2020, 47(4): 103-107.
[10] HOU Ming-xing,QI Hui,HUANG Bin-ke. Data Abnormality Processing in Wireless Sensor Networks Based on Distributed Compressed Sensing [J]. Computer Science, 2020, 47(1): 276-280.
[11] ZHONG Feng-yan, WANG Yan, LI Nian-shuang. Node Selection Scheme for Data Repair in Heterogeneous Distributed Storage Systems [J]. Computer Science, 2019, 46(8): 35-41.
[12] SHI Xiao-ling, CHEN Zhi, YANG Li-gong, SHEN Wei. Matrix Factorization Recommendation Algorithm Based on Adaptive Weighted Samples [J]. Computer Science, 2019, 46(6A): 488-492.
[13] WU Bin-feng. Design of IoT Middleware Based on Microservices Architecture [J]. Computer Science, 2019, 46(6A): 580-584.
[14] LU Zhu-bing, LI Yu-zhou. Recommendation Strategy Based on Trust Model via Emotional Analysis of Online Comment [J]. Computer Science, 2019, 46(6): 75-79.
[15] WANG Yong, WANG Yong-dong, DENG Jiang-zhou, ZHANG Pu. Recommendation Algorithm Based on Jensen-Shannon Divergence [J]. Computer Science, 2019, 46(2): 210-214.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!