Computer Science ›› 2019, Vol. 46 ›› Issue (2): 30-34.doi: 10.11896/j.issn.1002-137X.2019.02.005

• Big Data & Data Science • Previous Articles     Next Articles

Improved ROUSTIDA Algorithm for Missing Data Imputation with Key Attribute in Repetitive Data

FAN Zhe-ning, YANG Qiu-hui, ZHAI Yu-peng, WAN Ying, WANG Shuai   

  1. College of Computer Science(Software Engineering),Sichuan University,Chengdu 610065,China
  • Received:2017-12-05 Online:2019-02-25 Published:2019-02-25

Abstract: With the rise of data analysis,the importance of data pre-processing has attracted more and more attention,especially the imputation of missing data.Based on the ROUSTIDA algorithm,this paper proposed an improved ROUSTIDA algorithm-Key&Rpt_RS algorithm.Key&Rpt_RS algorithm inherits the advantages of ROUSTIDA algorithm,considers the characteristic of repeatability in objective data,and analyzes the influence of key attribute on imputation effect.At last,this paper conducted the experiments based on the alarm data in communication network.The results show that Key&Rpt_RS algorithm outperforms the traditional ROUSTIDA algorithm in terms of the imputation effect for missing data.

Key words: Data pre-processing, Missing data imputation, Repeated data, ROUSTIDA algorithm

CLC Number: 

  • TP391
[1]RUBIND B.Multiple imputation for nonresponse in surveys[J].Journal of Marketing Research,1987,137(1):180.
[2]SHUAI P,LI X S,ZHOU X H,et al.Theresearchprocesson statistical processing of missing data[J].Chinese Journal of Health Statistics,2013,30(1):135-139.(in Chinese)
帅平,李晓松,周晓华,等.缺失数据统计处理方法的研究进展[J].中国卫生统计,2013,30(1):135-139.
[3]YUE Y,TIAN K C.Review of data missing and its imputation method[J].Journal of Preventive Medicine Information,2005,21(6):683-685.(in Chinese)
岳勇,田考聪.数据缺失及其填补方法综述[J].预防医学情报杂志,2005,21(6):683-685.
[4]JIN Y J.Imputation adjustment method for missing data[J].Journal of applied statistics and management,2001,20(6):47-53.(in Chinese)
金勇进.缺失数据的插补调整[J].数理统计与管理,2001,20(6):47-53.
[5]DEMPSTER A P.Maximum likelihood estimation from incomplete data via the EM algorithm[J].Journal of the Royal Statistical Society,1977,39(1):1-38.
[6]JIN Y J.Adjusting for Missing Data by Weighting in Survey Analysis[J].Journal of applied statistics and management,2001(5):61-64.(in Chinese)
金勇进.缺失数据的加权调整(系列之IV)[J].数理统计与管理,2001(5):61-64.
[7]ROBINS J M,ROTNITZKY A,ZHAO L P.Estimation of Regression Coefficients When Some Regressors Are Not Always Observed[J].Journal of the American Statistical Association,1994,89(427):846-866.
[8]ZHANG Z H,LIU W Q.An Improved Algorithm Based on the Incomplete Data of the Rough Set Theory[J].Computer Engineering & Science,2002,24(4):41-42.(in Chinese)
张振华,刘文奇.一种基于粗集理论不完备数据的改进算法[J].计算机工程与科学,2002,24(4):41-42.
[9]DUAN P,ZHUANG H L,HE L,et al.Improved algorithm based on incomplete data analysis method[J].Computer Engineering and Design,2009,30(7):1681-1684.(in Chinese)
段鹏,庄红林,何磊,等.不完备数据分析方法(ROUSTIDA)的改进算法[J].计算机工程与设计,2009,30(7):1681-1684.
[10]TIAN S X,WU X P,WANG H X.Improved method for data reinforcement based on ROUSTIDA[J].Journal of Naval University of Engineering,2011,23(5):11-15.(in Chinese)
田树新,吴晓平,王红霞.一种基于改进的ROUSTIDA算法的数据补齐方法[J].海军工程大学学报,2011,23(5):11-15.
[11]DING C R,LI L S.Improved ROUSTIDA algorithm based on similarity relation vector[J].Computer Engineering and Applications,2014,50(13):133-136.(in Chinese)
丁春荣,李龙澍.基于相似关系向量的改进ROUSTIDA算法[J].计算机工程与应用,2014,50(13):133-136.
[12]PAWLAK Z.Rough set[J].International Journal of Computer & Information Sciences,1982,11(5):341-356.
[13]张文修.粗糙集理论与方法[M].北京:科学出版社,2001.
[14]SKOWRON A,RAUSZER C.The Discernibility Matrices and Functions in Information Systems[M]∥Intelligent Decision Support. Springer, Dordrecht,1992:331-362.
[15]王国胤.Rough集理论与知识获取[M].西安:西安交通大学出版社,2001.
[16]ZHANG W,LIAO X F,WU Z F.An incomplete data analysis approach based on rough set theory[J].Pattern Recognition and Artificial Intelligence,2003,16(2):158-163.(in Chinese)
张伟,廖晓峰,吴中福.一种基于Rough集理论的不完备数据分析方法[J].模式识别与人工智能,2003,16(2):158-163.
[17]MENG J,LIU Y C,MO H B.New method of packing missing data based on rough set theory[J].Computer Engineering and Applications,2008,44(6):175-177.(in Chinese)
孟军,刘永超,莫海波.基于粗糙集理论的不完备数据填补方法[J].计算机工程与应用,2008,44(6):175-177.
[1] ZHOU Bei, HUANG Yong-zhong, XU Jin-chen, GUO Shao-zhong. Study on SIMD Method of Vector Math Library [J]. Computer Science, 2019, 46(1): 320-324.
[2] LIU Jie-fang,ZHAO Bin and ZHOU Ning. Multilevel Real-time Payload-based Intrusion Detection System Framework [J]. Computer Science, 2014, 41(4): 126-133.
[3] . [J]. Computer Science, 2007, 34(3): 141-144.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!