基于空间距离自适应权重度量的粗糙K-means算法

doi:10.11896／j.issn.1002-137X.2018.07.033

计算机科学 ›› 2018, Vol. 45 ›› Issue (7): 190-196.doi: 10.11896／j.issn.1002-137X.2018.07.033

基于空间距离自适应权重度量的粗糙K-means算法

王慧研¹,张腾飞¹,马福民²

南京邮电大学自动化学院南京210023¹;
南京财经大学信息工程学院南京210023²

收稿日期:2017-05-18 出版日期:2018-07-30 发布日期:2018-07-30
作者简介:王慧研(1990-),女,硕士生,主要研究方向为粗糙聚类算法;张腾飞(1980-),男,博士,教授,硕士生导师,主要研究方向为智能信息处理、智能控制等,E-mail:tfzhang@126.com(通信作者);马福民(1979-),女,博士,副教授,硕士生导师,主要研究方向为智能信息处理、智能生产系统等。
基金资助:
本文受国家自然科学基金项目(61403184),江苏省高校自然科学研究重大项目(17KJA120001),江苏省“青蓝工程”基金(QL2016),南京邮电大学“1311人才计划”基金(NY2013),南京邮电大学科研项目基金(NY215149)资助。

Rough K-means Algorithm with Self-adaptive Weights Measurement Based on Space Distance

WANG Hui-yan¹,ZHANG Teng-fei¹,MA Fu-min²

College of Automation,Nanjing University of Posts and Telecommunications,Nanjing 210023,China¹;
College of Information Engineering,Nanjing University of Finance and Economics,Nanjing 210023,China²

Received:2017-05-18 Online:2018-07-30 Published:2018-07-30

摘要/Abstract

摘要： 粗糙K-means算法中下近似和边界区域权重系数的设置对算法的聚类效果有着重要的影响。传统的粗糙K-means算法及很多改进的粗糙K-means算法对所有类簇的下近似和边界区域设置固定的权重,忽视了簇内数据对象分布差异性的影响。针对这个问题,根据下近似和边界区域的数据对象相对于类簇中心的空间分布情况,提出一种新的基于空间距离自适应权重度量的粗糙K-means算法。该算法在每次迭代过程中,根据每个类簇的下近似和边界区域的数据对象相对于类簇中心的平均距离,综合度量下近似和边界区域对于类簇中心迭代计算的不同重要程度,动态地计算下近似和边界区域的相对权重系数。通过实例验证及实验仿真证明了所提算法的有效性。

关键词: 粗糙K-means, 粗糙集, 聚类算法, 自适应权重

Abstract: The setting of weights coefficient of lower approximation and boundary area in rough K-means algorithm has an important influence on final clustering results of algorithm.However,traditional rough K-means and many refined rough K-means algorithms set up fixed weights of lower approximations and boundary area for all clusters,ignoring the effect of distribution difference of data objects within clusters.To cope with this problem,a new rough K-means algorithm with self-adaptive weights measurement based on space distance was proposed according to the spatial distribution of objects in lower approximation and boundary area relative to the cluster centers.During each iteration process,diffe-rent importance of lower approximation and boundary area on iterative computation of cluster centers was measured based on average distance of objects in lower approximation and boundary area relative to cluster centers and the relative weights coefficient of lower approximation and boundary area were dynamically calculated.The validity of the algorithm was verified by experimental analysis.

Key words: Clustering algorithm, Rough k-means, Rough set, Self-adaptive weight

中图分类号:

TP391

王慧研,张腾飞,马福民. 基于空间距离自适应权重度量的粗糙K-means算法[J]. 计算机科学, 2018, 45(7): 190-196. https://doi.org/10.11896／j.issn.1002-137X.2018.07.033

WANG Hui-yan,ZHANG Teng-fei,MA Fu-min. Rough K-means Algorithm with Self-adaptive Weights Measurement Based on Space Distance[J]. Computer Science, 2018, 45(7): 190-196. https://doi.org/10.11896／j.issn.1002-137X.2018.07.033

参考文献

[1]HAN J,KAMBER M.Data mining,concepts and techniques(third edition) [M].California:Morgan Kaufmann Publishers,2011.
[2]HARTIGAN J A,WONG M A.A K-Means Clustering Algorithm [J].Journal of the Royal Statistical Society,1979,28(1):100-108.
[3]AMORIM R C D.A Survey on Feature Weighting BasedK-Means Algorithms[J].Journal of Classification,2016,33(2):210-242.
[4]KHANDARE A,ALVI A S.Survey of Improved k-means Clustering Algorithms:Improvements,Shortcomings and Scope for Further Enhancement and Scalability[C]∥Information Systems Design and Intelligent Applications.New Delhi:Springer,2016:495-503.
[5]BEZDEK J C.Fuzzy Mathematics in Pattern Classification[D].Ithaca:Cornell University,1973.
[6]BEZDEK J C.Pattern Recognition with Fuzzy Objective function Algorithms [M].New York:Plenum Press,1981.
[7]JIANG Z H,LI T T,MIN W F,et al.Fuzzy c-means clustering based on weights and gene expression programming [J].Pattern Recognition Letters,2017,90:1-7.
[8]CHEN H P,SHEN X J,LV Y D,et al.A novel automatic fuzzy clustering algorithm based on softpartition and membership information [J].Neurocomputing,2017,236(SI):104-112.
[9]KRISHNAPURAM R,KELLER J M.A Possibilistic Approach to Clustering [J].IEEE Transactions on Fuzzy Systems,1993,1(2):98-110.
[10]KRISHNAPURAM R,KELLER J M.The Possibilistic C-Means Algorithm:Insights and Recommendations [J].IEEE Transactions on Fuzzy Systems,1996,4(3):385-393.
[11]BARNIM,CAPPELLINI V,MECOCCI A.Comments on A Possibilistic Approach to Clustering [J].IEEE Transactions on Fuzzy Systems,1996,4(3):393-396.
[12]LINGRAS P,WEST C.Interval Set Clustering of Web Users with Rough K-Means [J].Journal of Intelligent Information Systems,2004,23(1):5-16.
[13]PAWLAK Z.Rough Sets[J].International Journal of Computer &Information Sciences,1982,11(5):341-356.
[14]MAJI P,PAL S K.RFCM:A Hybrid Clustering AlgorithmUsing Rough and Fuzzy Sets[J].Fundamenta Informaticae,2007,80(4):475-496.
[15]MITRA S,BANKA H,PEDRYCZ W.Rough-Fuzzy Collabora-tive Clustering [J].IEEE Transactions on Systems Manand Cybernetics Part B Cybernetics,2006,36(4):795-805.
[16]PAUL S,MAJI P.A New Rough-Fuzzy Clustering Algorithmand its Applications [C]∥Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012).New Delhi:Springer,2014:1245-1251.
[17]SHI J,LEI Y,ZHOU Y,et al.Enhanced rough-fuzzy c-means algorithm with strict rough setsproperties [J].Applied Soft Computing,2016,46:827-850.
[18]ZHANG T F,CHEN L,MA F M.A modified rough c-meansclustering algorithm based on hybrid imbalanced measureof distance and density[J].International Journal of Approximate Reasoning,2014,55(8):1805-1818.
[19]WANG H,ZHOU M.A refined rough k-means clustering with hybrid threshold [C]∥International Conference on Rough Sets and Current Trends in Computing.Berlin Heidelberg:Springer,2012:26-35.
[20]MALYSZKO D,STEPANIUK J.Rough Entropy Based k-Means Clustering [C]∥Rough Sets,Fuzzy Sets,Data Mining and Granular Computing.Berlin Heidelberg:Springer,2009:406-413.
[21]PAL S K,SHANKAR B U,MITRA P.Granular computing,rough entropy and object extraction[J].Pattern Recognition Letters,2005,26(16):2509-2517.
[22]PETERS G.Rough Clustering Utilizing the Principle of Indiffe-rence [J].Information Sciences,2014,277(2):358-374.
[23]WANG X E,HAN D Q,HAN C Z.Selection method for Para-meters of Rough Fuzzy C-Means Clustering Based on Uncertainty Measurement[J].Journal of Xi’an Jiaotong University,2013,47(6):55-60.(in Chinese)
王学恩,韩德强,韩崇昭.采用不确定性度量的粗糙模糊C均值聚类参数获取方法[J].西安交通大学学报,2013,47(6):55-60.
[24]PETERS G.Some refinements of rough k-means clustering [J].Pattern Recognition,2006,39(8):1481-1491.
[25]BEZDEK J C,PAL N R.Some New Indexes of Cluster Validity [J].IEEE Transactions on System Manand Cybernetics Part B Cybernetics,1988,28(3):301-315.

相关文章 15

[1]	柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[2]	程富豪, 徐泰华, 陈建军, 宋晶晶, 杨习贝. 基于顶点粒k步搜索和粗糙集的强连通分量挖掘算法 Strongly Connected Components Mining Algorithm Based on k-step Search of Vertex Granule and Rough Set Theory 计算机科学, 2022, 49(8): 97-107. https://doi.org/10.11896/jsjkx.210700202
[3]	许思雨, 秦克云. 基于剩余格的模糊粗糙集的拓扑性质 Topological Properties of Fuzzy Rough Sets Based on Residuated Lattices 计算机科学, 2022, 49(6A): 140-143. https://doi.org/10.11896/jsjkx.210200123
[4]	方连花, 林玉梅, 吴伟志. 随机多尺度序决策系统的最优尺度选择 Optimal Scale Selection in Random Multi-scale Ordered Decision Systems 计算机科学, 2022, 49(6): 172-179. https://doi.org/10.11896/jsjkx.220200067
[5]	陈于思, 艾志华, 张清华. 基于三角不等式判定和局部策略的高效邻域覆盖模型 Efficient Neighborhood Covering Model Based on Triangle Inequality Checkand Local Strategy 计算机科学, 2022, 49(5): 152-158. https://doi.org/10.11896/jsjkx.210300302
[6]	赵亮, 张洁, 陈志奎. 基于双图正则化的自适应多模态鲁棒特征学习 Adaptive Multimodal Robust Feature Learning Based on Dual Graph-regularization 计算机科学, 2022, 49(4): 124-133. https://doi.org/10.11896/jsjkx.210300078
[7]	孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[8]	王子茵, 李磊军, 米据生, 李美争, 解滨. 基于误分代价的变精度模糊粗糙集属性约简 Attribute Reduction of Variable Precision Fuzzy Rough Set Based on Misclassification Cost 计算机科学, 2022, 49(4): 161-167. https://doi.org/10.11896/jsjkx.210500211
[9]	王志成, 高灿, 邢金明. 一种基于正域的三支近似约简 Three-way Approximate Reduction Based on Positive Region 计算机科学, 2022, 49(4): 168-173. https://doi.org/10.11896/jsjkx.210500067
[10]	薛占熬, 侯昊东, 孙冰心, 姚守倩. 带标记的不完备双论域模糊概率粗糙集中近似集动态更新方法 Label-based Approach for Dynamic Updating Approximations in Incomplete Fuzzy Probabilistic Rough Sets over Two Universes 计算机科学, 2022, 49(3): 255-262. https://doi.org/10.11896/jsjkx.201200042
[11]	张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[12]	李艳, 范斌, 郭劼, 林梓源, 赵曌. 基于k-原型聚类和粗糙集的属性约简方法 Attribute Reduction Method Based on k-prototypes Clustering and Rough Sets 计算机科学, 2021, 48(6A): 342-348. https://doi.org/10.11896/jsjkx.201000053
[13]	李杉, 许新征. 基于双角度并行剪枝的VGG16优化方法 Parallel Pruning from Two Aspects for VGG16 Optimization 计算机科学, 2021, 48(6): 227-233. https://doi.org/10.11896/jsjkx.200800016
[14]	汤鑫瑶, 张正军, 储杰, 严涛. 基于自然最近邻的密度峰值聚类算法 Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor 计算机科学, 2021, 48(3): 151-157. https://doi.org/10.11896/jsjkx.200100112
[15]	曹林, 于威威. 基于图像分割的自适应窗口双目立体匹配算法研究 Adaptive Window Binocular Stereo Matching Algorithm Based on Image Segmentation 计算机科学, 2021, 48(11A): 314-318. https://doi.org/10.11896/jsjkx.201200264

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于空间距离自适应权重度量的粗糙K-means算法

Rough K-means Algorithm with Self-adaptive Weights Measurement Based on Space Distance

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0