计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 145-151.doi: 10.11896/jsjkx.200400043

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于密度峰值的加权犹豫模糊聚类算法

张煜, 陆亿红, 黄德才   

  1. 浙江工业大学计算机科学与技术学院 杭州 310023
  • 收稿日期:2020-04-10 修回日期:2020-08-08 出版日期:2021-01-15 发布日期:2021-01-15
  • 通讯作者: 陆亿红(lyh@zjut.edu.cn)
  • 作者简介:onlyyousee6@163.com
  • 基金资助:
    浙江省公益技术应用项目(LGG19E090001)

Weighted Hesitant Fuzzy Clustering Based on Density Peaks

ZHANG Yu, LU Yi-hong, HUANG De-cai   

  1. College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2020-04-10 Revised:2020-08-08 Online:2021-01-15 Published:2021-01-15
  • About author:ZHANG Yu,born in 1994,postgra-duate,is a member of China Computer Federation.Her main research interests include data mining and so on.
    LU Yi-hong,born in 1968,master,associate professor,is a member of China Computer Federation.Her main research interests include software theory and data mining.
  • Supported by:
    Zhejiang Public Welfare Technology Research Project(LGG19E090001).

摘要: 由于人们对事物认知的局限性和信息的不确定性,在对决策问题进行聚类分析时,传统的模糊聚类不能有效解决实际场景中的决策问题,因此有学者提出了有关犹豫模糊集的聚类算法。现有的层次犹豫模糊K均值聚类算法没有利用数据集本身的信息来确定距离函数的权值,且簇中心的计算复杂度和空间复杂度都是指数级的,不适用于大数据环境。针对上述问题,文中提出了一种基于密度峰值思想的加权犹豫模糊聚类算法(WHFDP),首先给出了犹豫模糊元素集的补齐方法,并结合变异系数理论给出了新的距离函数权重计算公式,然后利用密度峰值选取簇中心,不仅降低了簇中心计算的复杂度,而且提高了对不同规模以及任意形状数据集的适应性,算法的时间复杂度和空间复杂度也降为多项式级,最后采用典型数据集进行仿真实验,证明了所提算法的有效性。

关键词: 变异系数, 聚类算法, 密度峰值, 数据挖掘, 犹豫模糊集

Abstract: Due to cognitive limitations and the information uncertainty,traditional fuzzy clustering cannot effectively solve the decision-making problems in a real-life scenario when cluster analysis is carried out on the decision problem.Therefore,hesitant fuzzy sets(HFSs) clustering algorithms were proposed.The conception of hesitant fuzzy sets is evolved from fuzzy sets which are applied to fuzzy linguistic approach.The distance function of the hierarchical hesitant fuzzy K-means clustering algorithm has the same weight since the datasets information is seldom considered,and the computational complexity for computing the cluster center is exponential which is unavailable in the big data environment.In order to solve the above problems,this paper presents a novel clustering algorithm for hesitant fuzzy sets based on density peaks,called WHFDP.Firstly,a new method for extending the short hesitant fuzzy elements set to calculate the distance between two HFSs is proposed and a new formula for calculating the weight of distance function combined with the coefficient of variation is given.In addition,the computational complexity for computing the cluster center is reduced by using density peaks clustering method to select cluster center.Meanwhile,the adaptability to data sets with different sizes and arbitrary shapes is also improved.The time complexity and space complexity of the algorithm are reduced to polynomial level.Finally,typical data sets are used for simulation experiments,which prove the effectiveness of the new algorithm.

Key words: Clustering algorithm, Coefficient of variation, Data mining, Density peaks, Hesitant fuzzy sets

中图分类号: 

  • TP391
[1] XIA Z H,WANG X H,SUN X M,et al.Steganalysis of LSB matching using differences between nonadjacent pixels[J].Multimedia Tools and Applications,2016,75(4):1947-1962.
[2] ANWAR T,LIU C F,VU H L,et al.Partit-ioning road networks using density peak graphs:Efficiency vs.accuracy[J].Information Systems,2017,64(C):22-40.
[3] AHN C S,OH S Y.Robust vocabulary reco-gnition clustering model using an average estimator least mean square filter in noisy environments[J].Personal & Ubiquitous Computing,2013,18(6):1295-1301.
[4] JIN J G.Review of clustering method[J].Computer Science,2014,41(11A):288-293.
[5] STREHL A,GHOSH J.Cluster ensembles:a knowledge reuse framework for combining partitionings[J].Journal of Machine Learning Research,2002,3(3):583-617.
[6] FRED A L N,JAIN A K.Combining multiple clusterings using evidence accumulation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(6):835-850.
[7] IAMON N,BOONGOEN T,GARRETT S,et al.A link-based cluster ensemble approach for categorical data clustering[J].IEEE Transactions on Knowledge & Data Engineering,2012,24(3):413-425.
[8] TANG W,ZHOU Z H.Bagging-based selective clusterer ensemble[J].Journal of Software,2005,16(4):496-502.
[9] HUANG D,WANG C D,WU J S,et al.Ultra-Scalable Spectral Clustering and Ensemble Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2020,32(6):1212-1226.
[10] ZADEH L A.The concept of a linguistic variable and its application to approximate reasoning[J].Information Science,1975,8(3):199-249.
[11] LIAO H C,XU Z S,ZENG X J,et al.Qualitative decision making with correlation coefficients of hesitant fuzzy linguistic term sets[J].Knowledge Based Systems,2015,76:127-138.
[12] MENG F,CHEN X,ZHANG Q.Multi-attribute decision analysis under a linguistic hesitant fuzzy environment[J].Information Sciences,2014,267:287-305.
[13] YAVUZ M,OZTAYSI B,ONAR S C,et al.Multi-criteria evaluation of alternative-fuel vehicles via a hierarchical hesitant fuzzy linguistic model[J].Expert Systems with Applications,2015,42(5):2835-2848.
[14] ATANASSOV K T.Intuitionistic fuzzy sets[J].Fuzzy Sets and Systems,1986,20(1):87-96.
[15] MIZUMOTO M,TANAKA K.Some properties of fuzzy sets of type 2[J].Information and Control,1976,31(4):312-340.
[16] TORRA V,NARUKAWA Y.On Hesitant fuzzy sets and decision[C]//IEEE International Conference on Fuzzy Systems.2009:1378-1382.
[17] YAGE R,RONALD R.On the theory of bags[J].International Journal of General System,1986,13(1):23-37.
[18] MIYAMOTO S.Information clustering based on fuzzy multisets[J].Information Processing and Management,2003,39(2):195-213.
[19] YAO D B,WANG C C.Hesitant intuitionistic fuzzy entropy/cross-entropy and their applications[J].Soft Computing,2018,22(9):2809-2824.
[20] CHEN NA,XU Z S,XIA M M.Hierarchical hesitant fuzzy K-means clustering algorithm[J].Applied Mathematics-A Journal of Chinese Universities,2014,29(1):1-17.
[21] XIA M M,XU Z S.Hesitant fuzzy information aggregation in decision making[J].International Journal of ApproximateReasoning,2011,52(3):395-407.
[22] MERIGO J M,CASANOVAS M.Induced aggregation operators in decision making with the Dempster-Shafer belief structure[J].International Journal of Intelligent Systems,2009,24(8):934-954.
[23] LIU H W,WANG G J.Multi-criteria decision making methods based on intuitionistic fuzzy sets[J].European Journal of Operational Research,2007,179(1):220-233.
[24] XU Z S,XIA M M.Distance and similarity measures for hesitant fuzzy sets[J].Information Science,2011,181(11):2128-2138.
[25] RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[26] DU M J,DING S F,JIA H.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge Based Systems,2016,99:135-145.
[27] XIE J Y,GAO H C,LIU X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,354(c):19-40.
[1] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[2] 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平.
基于时空注意力克里金的边坡形变数据插值方法
Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation
计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161
[3] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[4] 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉.
基于差分隐私的K-means算法优化研究综述
Review of K-means Algorithm Optimization Based on Differential Privacy
计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008
[5] 张亚迪, 孙悦, 刘锋, 朱二周.
结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究
Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index
计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[6] 马董, 李新源, 陈红梅, 肖清.
星型高影响的空间co-location模式挖掘
Mining Spatial co-location Patterns with Star High Influence
计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186
[7] 徐慧慧, 晏华.
基于相对危险度的儿童先心病风险因素分析算法
Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children
计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082
[8] 李杉, 许新征.
基于双角度并行剪枝的VGG16优化方法
Parallel Pruning from Two Aspects for VGG16 Optimization
计算机科学, 2021, 48(6): 227-233. https://doi.org/10.11896/jsjkx.200800016
[9] 张岩金, 白亮.
一种基于符号关系图的快速符号数据聚类算法
Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph
计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011
[10] 汤鑫瑶, 张正军, 储杰, 严涛.
基于自然最近邻的密度峰值聚类算法
Density Peaks Clustering Algorithm Based on Natural Nearest Neighbor
计算机科学, 2021, 48(3): 151-157. https://doi.org/10.11896/jsjkx.200100112
[11] 张寒烁, 杨冬菊.
基于关系图谱的科技数据分析算法
Technology Data Analysis Algorithm Based on Relational Graph
计算机科学, 2021, 48(3): 174-179. https://doi.org/10.11896/jsjkx.191200154
[12] 邹承明, 陈德.
高维大数据分析的无监督异常检测方法
Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis
计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141
[13] 王茂光, 杨行.
一种基于AP-Entropy选择集成的风控模型和算法
Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble
计算机科学, 2021, 48(11A): 71-76. https://doi.org/10.11896/jsjkx.210200110
[14] 刘新斌, 王丽珍, 周丽华.
MLCPM-UC:一种基于模式实例分布均匀系数的多级co-location模式挖掘算法
MLCPM-UC:A Multi-level Co-location Pattern Mining Algorithm Based on Uniform Coefficient of Pattern Instance Distribution
计算机科学, 2021, 48(11): 208-218. https://doi.org/10.11896/jsjkx.201000097
[15] 刘晓楠, 宋慧超, 王洪, 江舵, 安家乐.
Grover算法改进与应用综述
Survey on Improvement and Application of Grover Algorithm
计算机科学, 2021, 48(10): 315-323. https://doi.org/10.11896/jsjkx.201100141
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!