广义加权Minkowski距离及量子遗传聚类算法

计算机科学 ›› 2013, Vol. 40 ›› Issue (5): 224-228.

广义加权Minkowski距离及量子遗传聚类算法

钱国红,黄德才,陆亿红

浙江工业大学计算机科学与技术系杭州310023;浙江工业大学计算机科学与技术系杭州310023;浙江工业大学计算机科学与技术系杭州310023

出版日期:2018-11-16 发布日期:2018-11-16
基金资助:
本文受国家水体污染控制与治理科技重大专项(2009ZX07318-003-01-02),水利部公益性行业科研专项(201001031)资助

General Weighted Minkowski Distance and Quantum Genetic Clustering Algorithm

QIAN Guo-hong,HUANG De-cai and LU Yi-hong

Online:2018-11-16 Published:2018-11-16

摘要/Abstract

摘要： 相异度和相似度度量是聚类算法中非常重要的一种因素,往往会影响到聚类分析的结果。很多聚类算法采用欧式距离作为计算数据相似度的度量。而欧式距离不能反映属性值的全局特性,且不顾及各属性之间的量纲差异,因此当不同属性间具有明显量纲或值域差异时,不能取得很好的效果。对此,提出了一种广义加权Minkowski距离,即由各属性的量纲和值域信息来确定各属性的广义权值,既考虑了整个数据集的特性,又消除了各属性之间的不和谐,同时分位数的引进在一定程度上减弱了噪声属性值对距离度量的影响。将提出的新的距离度量用于经典的k-means算法和量子遗传聚类算法,实验结果表明,采用新的距离度量和引进量子遗传算法的聚类是更加有效的。

关键词: 数据聚类,Minkowski距离,分位数,全局信息,量子遗传算法

Abstract: Difference and similarity are very important factor in clustering algorithms,and always affect the results of clustering analysis．A lot of clustering algorithms use Euclidean distance as it’s similarity measure．Euclidean distance can''t reflect the global information of attributes,and don''t consider the unit differences between each attribute,so it can’t make a good result when there is obvious unit and domain differences．So,this paper put forward a generally weighted Minkowski distance which is determined by the unit and domain information of each attributes value．Not only characteristics of whole data are considered,but also dicord between attributes is removed,at the same time,using of fractional bits weakens the noise data influence．We used new distance measure in classic k-means.And quantum genetic k-means and the experimental result show that the new algorithm is effective.

Key words: Data clustering,Minkowski distance,Fractional bits,Global information,QGA

钱国红,黄德才,陆亿红. 广义加权Minkowski距离及量子遗传聚类算法[J]. 计算机科学, 2013, 40(5): 224-228. https://doi.org/

QIAN Guo-hong,HUANG De-cai and LU Yi-hong. General Weighted Minkowski Distance and Quantum Genetic Clustering Algorithm[J]. Computer Science, 2013, 40(5): 224-228. https://doi.org/

参考文献

[1] 孙吉贵,刘杰,赵连宇．聚类算法研究[J].软件学报,2008,19(1):48-61
[2] Nguyen C D,Cios K J．GAKREM:A novel hybrid clustering algorithm[J].Information Sciences,2008,178(22):4205-4227
[3] Liu Jing-wei,Xu Mei-zhi．Kernelized fuzzy attribute C-meansclustering algorithm [J].Fuzzy Sets and Systems,2008,159(18):2428-2445
[4] Graves D,Pedrycz W．Kernel-based fuzzy clustering and fuzzyclustering:A comparative experimental study[J]．Fuzzy sets and systems,2010,161(4):522-543
[5] Li M J,Ng M K,Cheung Y-M,et al．Agglomerative fuzzy K-Means clustering algorithm with selection of number of clusters[J]．IEEE Transactions on Knowledge and Data Engineering,2008,20(11):1519-1534
[6] Bobrowski L,Bezdek J C．c-means clustering with the L_1and L∞norms[J]．IEEE Trans on System,Man and Cybernetics,1991,21(3):545-554
[7] Hathaway R J,Bezdek J C,Hu Y．Generalized fuzzy c-meansclustering strategies using Lp norm distances[J]．IEEE Trans on Fuzzy Systems,2000,8(5):576-582
[8] Weinberger K Q,Saul L K．Distance metric learning for large margin nearest neighbor classification[J]．J of Machine Learning Research,2009,10(2):207-244
[9] 叶斌,胡修林,张蕴玉,等.基于3D Zernike矩的三维地形匹配算法及性能分析[J]．宇航学报,2007,28(5):1241-1245
[10] Zhang D,ChenS．A comment on alternative c-means clustering algorithms[J].Pattern Recognition,2004,37(2):173-174
[11] Bezdek J C．A convergence theorem for the fuzzy ISODATAclustering algorithms [J]．IEEE Transactions on Pattern Anal Machine Intel,1980,2(1):1-8
[12] 钱国红,黄德才.基于3D角度编码的量子遗传算法[J].计算机科学,2012(8):242-245
[13] 张愿章．Young不等式的证明及应用[J]．河南科学,2004,22(1):23-29
[14] 西奥多里德斯,等．模式识别[M]．北京:电子工业出版社,2006
[15] 张葛祥,李娜,金炜东,等.一种新量子遗传算法及其应用[J].电子学报,2004,32(3):476-479
[16] Yang Yi-ming．An evaluation of statistical apporachs to text cate-gorization[J]．Journal of Information Retrieval,1999,1(1/2):67-88

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed