Computer Science ›› 2016, Vol. 43 ›› Issue (6): 263-269.doi: 10.11896/j.issn.1002-137X.2016.06.052

Previous Articles     Next Articles

Research for Uncertain Data Clustering Algorithm:U-PAM and UM-PAM Algorithm

HE Yun-bin, ZHANG Zhi-chao, WAN Jing and LI Song   

  • Online:2018-12-01 Published:2018-12-01

Abstract: UK-means algorithm is very sensitive to outliers in dealing with uncertain data,and the probability density or distribution function of uncertain data must be acquired in advance.However,it is often difficult to obtain in practice.For the shortage of UK-means in dealing with uncertainty measurement data,this paper firstly proposed a new algorithm namely U-PAM,based on PAM algorithm and intervals.It describes the uncertainty of measurement data with intervals reasonably and standard deviation so as to complete clustering effectively.Secondly,it is often difficult to cluster for the massive of data.For this regard,according to sampling techniques,this paper proposed the UM-PAM algorithm so as to deal with massive of uncertainty measurement data efficiently.It primary clusters sample data,and then clusters overall.Finally,the U-PAM algorithm can analyze the clustering result by combining with the CH validity index to determine the optimal clustering number.Experimental results show that the proposed algorithm can give effective clustering result obviously.

Key words: Uncertain data,Intervals,Clustering,PAM

[1] Xing Chang-zheng,Wen Pei.Uncertain data streams clustering algorithm based on grid density and force[J].Applicaton Research of Computer,2015,32(1):98-101(in Chinese) 邢长征,温培.基于网格密度和引力的不确定数据流聚类算法[J].计算机应用研究,2015,32(1):98-101
[2] Zhou Tao,Lu Hui-ling.Clustering algorithm research advances on data mining[J].Computer Engineering and Applications,2012,8(12):100-111(in Chinese) 周涛,陆惠玲.数据挖掘中聚类算法研究进展[J].计算机工程与应用,2012,48(12):100-111
[3] Sun J G,Liu J,Zhao L Y.Clustering algorithms research[J].Journal of Software,2008,9(1):48-61(in Chinese) 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61
[4] Chau M,Cheng R,Kao B,et al.Uncertain data mining:An example in clustering location data[M]∥Advances in Knowledge Discovery and Data Mining.Springer Berlin Heidelberg,2006:199-204
[5] Lee S D,Kao B,Cheng R.Reducing UK-means to K-means[C]∥ Seventh IEEE International Conference on Data Mining Workshops,2007.ICDM Workshops 2007.IEEE,2007:483-488
[6] Peng Yu,Luo Qing-hua,Peng Xi-yuan.A multi-dimensional une-rtain measurement data clustering algorithm[J].Chinese Journal of Scientific Instrument,2011,2(6):1201-1207(in Chinese) 彭宇,罗清华,彭喜元.UIDK-means:多维不确定性测量数据聚类算法[J].仪器仪表学报,2011,32(6):1201-1207
[7] Ren Pei-hua,Wang Li-zhen.Improved K-means Clustering Al-gorithm Based on DKC in Uncertain Region Environment[J].Computer Science,2013,40(4):181-184(in Chinese) 任培花,王丽珍.不确定域环境下基于 DKC 值改进的 K-means聚类算法[J].计算机科学,2013,40(4):181-184
[8] Kao B,Lee S D,Lee F K F,et al.Clustering uncertain data using voronoi diagrams and r-tree index[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(9):1219-1233
[9] H Jian,S Shu-bin,M Yi-min,et al.High Dimensional Uncertain Data Efficient Clustering Algorithm[J].Computer Knowledge & Technology,2014(4)
[10] Gullo F,Ponti G,Tagarelli A.Clustering uncertain data via k-medoids[M]∥Scalable Uncertainty Management.Springer Berlin Heidelberg,2008:229-242
[11] Xie Xiao-lu,Li Lei.Research on Multi-attribute Group Decision Under Interval Number Information[J].Computer Engineering,2014,40(10):210-213(in Chinese) 谢小璐,李磊.区间数信息下的多属性群决策研究[J].计算机工程,2014,40(10):210-213
[12] Reynolds P A,Richards G J,Rayward-smith V.The Application of K-Medoids and PAM to the Clustering of Rules[J].Lecture Notes in Computer Science,2004,3177:173-178
[13] Aggarwal C C.Yu P S.A survey of uncertain dataalgorithms and applieations[J].IEEE Transactions Onknowledge and Data Engineering,2009,21(5):609-623
[14] Lu Zhi-mao,Feng Jin-gong,Fan Dong-mei,et al.New clustering algorithms for large data processing[J].System Engineering and Electronics,2014(5):1010-1015(in Chinese) 卢志茂,冯进玫,范冬梅,等.面向大数据处理的划分聚类新方法[J].系统工程与电子技术,2014(5):1010-1015
[15] Zhou Shi-bing,Xu Zhen-yuan,Tang Xu-qing.New method fordetermining optimal number of clusters in K-means clustering algorithm[J].Computer Engineering and Applications,2010,46(16):27-31(in Chinese) 周世兵,徐振源,唐旭清.新的K-均值算法最佳聚类数确定方法[J].计算机工程与应用,2010,46(16):27-31
[16] Yu Jian,Cheng Qian-sheng.Search range of the Optimal clustering number in fuzzy clustering algorithms[J].Science in China:Series E,2002,32(2):274-280(in Chinese) 于剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围[J].中国科学:E辑,2002,32(2):274-280
[17] Dudoit S,Fridlyand J.A prediction-based resampling method for estimating the number of clusters in a dataset[J].Genome Bio-logy,2002,3(7):1-21
[18] Kao B,Lee S,Lee F,et al.Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index.[J].Knowledge & Data Engineering IEEE Transactions on,2010,22(9):1219-1233
[19] Eredm A,mre GüNDEM T.M-FDBSCAN:A multicore density-based uncertain data clustering algorithm[J].Turkish Journal of Electrical Engineering & Computer Sciences,2014,22(1):143-154

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!