计算机科学 ›› 2016, Vol. 43 ›› Issue (6): 263-269.doi: 10.11896/j.issn.1002-137X.2016.06.052

• 人工智能 • 上一篇    下一篇

不确定数据聚类的U-PAM算法和UM-PAM算法的研究

何云斌,张志超,万静,李松   

  1. 哈尔滨理工大学计算机科学与技术学院 哈尔滨150080,哈尔滨理工大学计算机科学与技术学院 哈尔滨150080,哈尔滨理工大学计算机科学与技术学院 哈尔滨150080,哈尔滨理工大学计算机科学与技术学院 哈尔滨150080
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受黑龙江省教育厅科学技术研究项目(12511100),黑龙江省自然科学基金项目(F201302,F201134)资助

Research for Uncertain Data Clustering Algorithm:U-PAM and UM-PAM Algorithm

HE Yun-bin, ZHANG Zhi-chao, WAN Jing and LI Song   

  • Online:2018-12-01 Published:2018-12-01

摘要: UK-means算法在处理不确定数据时对孤立点非常敏感,而且事先必须已知不确定数据的分布函数或概率密度,然而这在实际中往往很难获得。因此,针对UK-means在处理不确定测量数据时的不足,首先提出了基于区间数的PAM不确定聚类算法——U-PAM,该算法用区间数和标准差合理地描述了不确定测量数据的不确定性,进而完成有效的聚类;其次,针对海量不确定测量数据难以聚类的问题,基于U-PAM聚类算法,采用抽样技术提出了处理海量不确定测量数据的算法——UM-PAM算法,该算法先抽样,对样本数据聚类,然后再总体聚类;最后,基于U-PAM算法和CH聚类的有效性指标函数对聚类结果进行分析,以确定最佳聚类数。实验理论表明,所提算法聚类效果明显。

关键词: 不确定数据,区间数,聚类算法,PAM

Abstract: UK-means algorithm is very sensitive to outliers in dealing with uncertain data,and the probability density or distribution function of uncertain data must be acquired in advance.However,it is often difficult to obtain in practice.For the shortage of UK-means in dealing with uncertainty measurement data,this paper firstly proposed a new algorithm namely U-PAM,based on PAM algorithm and intervals.It describes the uncertainty of measurement data with intervals reasonably and standard deviation so as to complete clustering effectively.Secondly,it is often difficult to cluster for the massive of data.For this regard,according to sampling techniques,this paper proposed the UM-PAM algorithm so as to deal with massive of uncertainty measurement data efficiently.It primary clusters sample data,and then clusters overall.Finally,the U-PAM algorithm can analyze the clustering result by combining with the CH validity index to determine the optimal clustering number.Experimental results show that the proposed algorithm can give effective clustering result obviously.

Key words: Uncertain data,Intervals,Clustering,PAM

[1] Xing Chang-zheng,Wen Pei.Uncertain data streams clustering algorithm based on grid density and force[J].Applicaton Research of Computer,2015,32(1):98-101(in Chinese) 邢长征,温培.基于网格密度和引力的不确定数据流聚类算法[J].计算机应用研究,2015,32(1):98-101
[2] Zhou Tao,Lu Hui-ling.Clustering algorithm research advances on data mining[J].Computer Engineering and Applications,2012,8(12):100-111(in Chinese) 周涛,陆惠玲.数据挖掘中聚类算法研究进展[J].计算机工程与应用,2012,48(12):100-111
[3] Sun J G,Liu J,Zhao L Y.Clustering algorithms research[J].Journal of Software,2008,9(1):48-61(in Chinese) 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61
[4] Chau M,Cheng R,Kao B,et al.Uncertain data mining:An example in clustering location data[M]∥Advances in Knowledge Discovery and Data Mining.Springer Berlin Heidelberg,2006:199-204
[5] Lee S D,Kao B,Cheng R.Reducing UK-means to K-means[C]∥ Seventh IEEE International Conference on Data Mining Workshops,2007.ICDM Workshops 2007.IEEE,2007:483-488
[6] Peng Yu,Luo Qing-hua,Peng Xi-yuan.A multi-dimensional une-rtain measurement data clustering algorithm[J].Chinese Journal of Scientific Instrument,2011,2(6):1201-1207(in Chinese) 彭宇,罗清华,彭喜元.UIDK-means:多维不确定性测量数据聚类算法[J].仪器仪表学报,2011,32(6):1201-1207
[7] Ren Pei-hua,Wang Li-zhen.Improved K-means Clustering Al-gorithm Based on DKC in Uncertain Region Environment[J].Computer Science,2013,40(4):181-184(in Chinese) 任培花,王丽珍.不确定域环境下基于 DKC 值改进的 K-means聚类算法[J].计算机科学,2013,40(4):181-184
[8] Kao B,Lee S D,Lee F K F,et al.Clustering uncertain data using voronoi diagrams and r-tree index[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(9):1219-1233
[9] H Jian,S Shu-bin,M Yi-min,et al.High Dimensional Uncertain Data Efficient Clustering Algorithm[J].Computer Knowledge & Technology,2014(4)
[10] Gullo F,Ponti G,Tagarelli A.Clustering uncertain data via k-medoids[M]∥Scalable Uncertainty Management.Springer Berlin Heidelberg,2008:229-242
[11] Xie Xiao-lu,Li Lei.Research on Multi-attribute Group Decision Under Interval Number Information[J].Computer Engineering,2014,40(10):210-213(in Chinese) 谢小璐,李磊.区间数信息下的多属性群决策研究[J].计算机工程,2014,40(10):210-213
[12] Reynolds P A,Richards G J,Rayward-smith V.The Application of K-Medoids and PAM to the Clustering of Rules[J].Lecture Notes in Computer Science,2004,3177:173-178
[13] Aggarwal C C.Yu P S.A survey of uncertain dataalgorithms and applieations[J].IEEE Transactions Onknowledge and Data Engineering,2009,21(5):609-623
[14] Lu Zhi-mao,Feng Jin-gong,Fan Dong-mei,et al.New clustering algorithms for large data processing[J].System Engineering and Electronics,2014(5):1010-1015(in Chinese) 卢志茂,冯进玫,范冬梅,等.面向大数据处理的划分聚类新方法[J].系统工程与电子技术,2014(5):1010-1015
[15] Zhou Shi-bing,Xu Zhen-yuan,Tang Xu-qing.New method fordetermining optimal number of clusters in K-means clustering algorithm[J].Computer Engineering and Applications,2010,46(16):27-31(in Chinese) 周世兵,徐振源,唐旭清.新的K-均值算法最佳聚类数确定方法[J].计算机工程与应用,2010,46(16):27-31
[16] Yu Jian,Cheng Qian-sheng.Search range of the Optimal clustering number in fuzzy clustering algorithms[J].Science in China:Series E,2002,32(2):274-280(in Chinese) 于剑,程乾生.模糊聚类方法中的最佳聚类数的搜索范围[J].中国科学:E辑,2002,32(2):274-280
[17] Dudoit S,Fridlyand J.A prediction-based resampling method for estimating the number of clusters in a dataset[J].Genome Bio-logy,2002,3(7):1-21
[18] Kao B,Lee S,Lee F,et al.Clustering Uncertain Data Using Voronoi Diagrams and R-Tree Index.[J].Knowledge & Data Engineering IEEE Transactions on,2010,22(9):1219-1233
[19] Eredm A,mre GüNDEM T.M-FDBSCAN:A multicore density-based uncertain data clustering algorithm[J].Turkish Journal of Electrical Engineering & Computer Sciences,2014,22(1):143-154

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!