Computer Science ›› 2017, Vol. 44 ›› Issue (12): 58-63.doi: 10.11896/j.issn.1002-137X.2017.12.011

Previous Articles     Next Articles

Fuzzy Clustering Algorithm for Incomplete Data Considering Missing Pattern

ZHENG Qi-bin, DIAO Xing-chun and CAO Jian-jun   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Data integrality is an important metric for data availability.For the problems in data acquisition,datasets in real world are always incomplete.Missing data are usually ignored or imputed in common clustering algorithm.When data missing is missing not at random,ignorance or imputation will result poor clustering accuracy.Considering the relationship of the data missing pattern and the missing value,two PCM (Possibilistic c-means) clustering algorithms were proposed:PatDistPCM based on minimizing the sum of missing pattern distance and PatCluPCM based on missing pattern clustering.The experiments on public datasets show that the two proposed fuzzy clustering algorithms PatDistPCM and PatCluPCM can improve clustering precision and recall when clustering data are of missing not at random.

Key words: Data integrality,Fuzzy clustering,MNAR,Missing pattern,Possibilistic c-means

[1] HAN J W,KAMBER M,PEI J.Data Mining:Concepts andTechniques(3rd ed)[M].Morgan Kaufmann Publishers,2011:288-293.
[2] GU Y,YU G,LI X J,et al.RFID data interpolation algorithm based on dynamic probabilistic path-event model[J].Journal of Software,2010,1(3):438-451.
[3] DIXON J K.Pattern recognition with partly missing data[J].IEEE Transactions on Systems,Man and Cybernetics,1979,9(10):617-621.
[4] BEZDEK J C.Pattern recognition with fuzzy objective function algorithms[M].Plenum Press,1981.
[5] HATHAWAY R J,BEZDEK J C.Fuzzy c-Means Clustering ofIncomplete Data[J].IEEE Transactions on System,Man,and Cybernetics,2001,1(5):735-744.
[6] BALKIS A,YAHIA S B.A new algorithm for fuzzy clustering handling incomplete dataset[J].International Journal on Artificial Intelligence Tools,2014,3(4):1460012.
[7] KRISHNAPURAM R,KELLER J M.A possibilistic Approach to clustering[J].IEEE Transactions on Fuzzy Systems,1993,1(2):98-110.
[8] ZHANG Q,CHEN Z.A distributed weighted Possibilistic c-Means algorithm for clustering incomplete big sensor data[J].International Journal of Distributed Sensor Networks,2014,2014(2):4.
[9] LITEEL R J A,RUBIN D B.Statistical Analysis with Missing Data[M].John Wiley & Sons,Inc.New Jersey,2002.
[10] Donald D B.Inference and Missing Data[J].Biometrika,1976,3(3):581-592.
[11] ALLISON P D.数据缺失[M].林毓玲,译.上海:格致出版社,2012.
[12] MARLIN B M.Missing Data Problems in Machine Learning[D].Toronto:University of Toronto,2008.
[13] MARLIN B M,ZEMEL R S.Collaborative Prediction and Ranking with Non-Random Missing Data[C]∥RecSys’09.New York,USA,2009:23-25.
[14] WANG H,WANG S.Discovering patterns of missing data inservey databases:An application of rough sets[J].Expert Systems with Applications,2009,36(3):6256-6260.
[15] TIMM H,BORGELT C,KRUSE R.An Extension of Possibilistic Fuzzy Cluster Analysis[J].Fuzzy Sets and Systems,2004,7(1):3-16.
[16] BAGGA A,BALDWIN B.Entity-based cross-document corefe-rencing using the vector space model[C]∥Proc.1998 Annual Meeting of the Association for Computational Linguistics and Int.Conf.Computational Linguistics (COLING-ACL’98).Montreal,Quebec,Canada,1998.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!