计算机科学 ›› 2021, Vol. 48 ›› Issue (1): 145-151.doi: 10.11896/jsjkx.200400043

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于密度峰值的加权犹豫模糊聚类算法

张煜, 陆亿红, 黄德才   

  1. 浙江工业大学计算机科学与技术学院 杭州 310023
  • 收稿日期:2020-04-10 修回日期:2020-08-08 出版日期:2021-01-15 发布日期:2021-01-15
  • 通讯作者: 陆亿红(lyh@zjut.edu.cn)
  • 作者简介:onlyyousee6@163.com
  • 基金资助:
    浙江省公益技术应用项目(LGG19E090001)

Weighted Hesitant Fuzzy Clustering Based on Density Peaks

ZHANG Yu, LU Yi-hong, HUANG De-cai   

  1. College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou 310023,China
  • Received:2020-04-10 Revised:2020-08-08 Online:2021-01-15 Published:2021-01-15
  • About author:ZHANG Yu,born in 1994,postgra-duate,is a member of China Computer Federation.Her main research interests include data mining and so on.
    LU Yi-hong,born in 1968,master,associate professor,is a member of China Computer Federation.Her main research interests include software theory and data mining.
  • Supported by:
    Zhejiang Public Welfare Technology Research Project(LGG19E090001).

摘要: 由于人们对事物认知的局限性和信息的不确定性,在对决策问题进行聚类分析时,传统的模糊聚类不能有效解决实际场景中的决策问题,因此有学者提出了有关犹豫模糊集的聚类算法。现有的层次犹豫模糊K均值聚类算法没有利用数据集本身的信息来确定距离函数的权值,且簇中心的计算复杂度和空间复杂度都是指数级的,不适用于大数据环境。针对上述问题,文中提出了一种基于密度峰值思想的加权犹豫模糊聚类算法(WHFDP),首先给出了犹豫模糊元素集的补齐方法,并结合变异系数理论给出了新的距离函数权重计算公式,然后利用密度峰值选取簇中心,不仅降低了簇中心计算的复杂度,而且提高了对不同规模以及任意形状数据集的适应性,算法的时间复杂度和空间复杂度也降为多项式级,最后采用典型数据集进行仿真实验,证明了所提算法的有效性。

关键词: 数据挖掘, 聚类算法, 犹豫模糊集, 密度峰值, 变异系数

Abstract: Due to cognitive limitations and the information uncertainty,traditional fuzzy clustering cannot effectively solve the decision-making problems in a real-life scenario when cluster analysis is carried out on the decision problem.Therefore,hesitant fuzzy sets(HFSs) clustering algorithms were proposed.The conception of hesitant fuzzy sets is evolved from fuzzy sets which are applied to fuzzy linguistic approach.The distance function of the hierarchical hesitant fuzzy K-means clustering algorithm has the same weight since the datasets information is seldom considered,and the computational complexity for computing the cluster center is exponential which is unavailable in the big data environment.In order to solve the above problems,this paper presents a novel clustering algorithm for hesitant fuzzy sets based on density peaks,called WHFDP.Firstly,a new method for extending the short hesitant fuzzy elements set to calculate the distance between two HFSs is proposed and a new formula for calculating the weight of distance function combined with the coefficient of variation is given.In addition,the computational complexity for computing the cluster center is reduced by using density peaks clustering method to select cluster center.Meanwhile,the adaptability to data sets with different sizes and arbitrary shapes is also improved.The time complexity and space complexity of the algorithm are reduced to polynomial level.Finally,typical data sets are used for simulation experiments,which prove the effectiveness of the new algorithm.

Key words: Data mining, Clustering algorithm, Hesitant fuzzy sets, Density peaks, Coefficient of variation

中图分类号: 

  • TP391
[1] XIA Z H,WANG X H,SUN X M,et al.Steganalysis of LSB matching using differences between nonadjacent pixels[J].Multimedia Tools and Applications,2016,75(4):1947-1962.
[2] ANWAR T,LIU C F,VU H L,et al.Partit-ioning road networks using density peak graphs:Efficiency vs.accuracy[J].Information Systems,2017,64(C):22-40.
[3] AHN C S,OH S Y.Robust vocabulary reco-gnition clustering model using an average estimator least mean square filter in noisy environments[J].Personal & Ubiquitous Computing,2013,18(6):1295-1301.
[4] JIN J G.Review of clustering method[J].Computer Science,2014,41(11A):288-293.
[5] STREHL A,GHOSH J.Cluster ensembles:a knowledge reuse framework for combining partitionings[J].Journal of Machine Learning Research,2002,3(3):583-617.
[6] FRED A L N,JAIN A K.Combining multiple clusterings using evidence accumulation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27(6):835-850.
[7] IAMON N,BOONGOEN T,GARRETT S,et al.A link-based cluster ensemble approach for categorical data clustering[J].IEEE Transactions on Knowledge & Data Engineering,2012,24(3):413-425.
[8] TANG W,ZHOU Z H.Bagging-based selective clusterer ensemble[J].Journal of Software,2005,16(4):496-502.
[9] HUANG D,WANG C D,WU J S,et al.Ultra-Scalable Spectral Clustering and Ensemble Clustering[J].IEEE Transactions on Knowledge and Data Engineering,2020,32(6):1212-1226.
[10] ZADEH L A.The concept of a linguistic variable and its application to approximate reasoning[J].Information Science,1975,8(3):199-249.
[11] LIAO H C,XU Z S,ZENG X J,et al.Qualitative decision making with correlation coefficients of hesitant fuzzy linguistic term sets[J].Knowledge Based Systems,2015,76:127-138.
[12] MENG F,CHEN X,ZHANG Q.Multi-attribute decision analysis under a linguistic hesitant fuzzy environment[J].Information Sciences,2014,267:287-305.
[13] YAVUZ M,OZTAYSI B,ONAR S C,et al.Multi-criteria evaluation of alternative-fuel vehicles via a hierarchical hesitant fuzzy linguistic model[J].Expert Systems with Applications,2015,42(5):2835-2848.
[14] ATANASSOV K T.Intuitionistic fuzzy sets[J].Fuzzy Sets and Systems,1986,20(1):87-96.
[15] MIZUMOTO M,TANAKA K.Some properties of fuzzy sets of type 2[J].Information and Control,1976,31(4):312-340.
[16] TORRA V,NARUKAWA Y.On Hesitant fuzzy sets and decision[C]//IEEE International Conference on Fuzzy Systems.2009:1378-1382.
[17] YAGE R,RONALD R.On the theory of bags[J].International Journal of General System,1986,13(1):23-37.
[18] MIYAMOTO S.Information clustering based on fuzzy multisets[J].Information Processing and Management,2003,39(2):195-213.
[19] YAO D B,WANG C C.Hesitant intuitionistic fuzzy entropy/cross-entropy and their applications[J].Soft Computing,2018,22(9):2809-2824.
[20] CHEN NA,XU Z S,XIA M M.Hierarchical hesitant fuzzy K-means clustering algorithm[J].Applied Mathematics-A Journal of Chinese Universities,2014,29(1):1-17.
[21] XIA M M,XU Z S.Hesitant fuzzy information aggregation in decision making[J].International Journal of ApproximateReasoning,2011,52(3):395-407.
[22] MERIGO J M,CASANOVAS M.Induced aggregation operators in decision making with the Dempster-Shafer belief structure[J].International Journal of Intelligent Systems,2009,24(8):934-954.
[23] LIU H W,WANG G J.Multi-criteria decision making methods based on intuitionistic fuzzy sets[J].European Journal of Operational Research,2007,179(1):220-233.
[24] XU Z S,XIA M M.Distance and similarity measures for hesitant fuzzy sets[J].Information Science,2011,181(11):2128-2138.
[25] RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496.
[26] DU M J,DING S F,JIA H.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge Based Systems,2016,99:135-145.
[27] XIE J Y,GAO H C,LIU X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,354(c):19-40.
[1] 游兰, 韩雪薇, 何正伟, 肖丝雨, 何渡, 潘筱萌. 基于改进Seq2Seq的短时AIS轨迹序列预测模型[J]. 计算机科学, 2020, 47(9): 169-174.
[2] 陈玉金, 徐吉辉, 史佳辉, 刘宇. 基于直觉犹豫模糊集的三支决策模型及其应用[J]. 计算机科学, 2020, 47(8): 144-150.
[3] 徐守坤, 倪楚涵, 吉晨晨, 李宁. 基于YOLOv3的施工场景安全帽佩戴的图像描述[J]. 计算机科学, 2020, 47(8): 233-240.
[4] 张素梅, 张波涛. 一种基于量子耗散粒子群的评估模型构建方法[J]. 计算机科学, 2020, 47(6A): 84-88.
[5] 袁得嵛, 章逸钒, 高见, 孙海春. 基于用户特征提取的新浪微博异常用户检测方法[J]. 计算机科学, 2020, 47(6A): 364-368.
[6] 邓甜甜, 熊荫乔, 何贤浩. 一种基于时序性告警的新型聚类算法[J]. 计算机科学, 2020, 47(6A): 440-443.
[7] 李莉. 基于判断聚合的分布式数据挖掘分类算法研究[J]. 计算机科学, 2020, 47(6A): 450-456.
[8] 张琴, 陈红梅, 封云飞. 一种基于粗糙集和密度峰值的重叠社区发现方法[J]. 计算机科学, 2020, 47(5): 72-78.
[9] 余航, 魏炜, 谭征, 刘惊雷. 基于信任系统的条件偏好协同度量框架[J]. 计算机科学, 2020, 47(4): 74-84.
[10] 陈俊芬,张明,赵佳成. 复杂高维数据的密度峰值快速搜索聚类算法[J]. 计算机科学, 2020, 47(3): 79-86.
[11] 田献珍, 孙立强, 田振中. 基于蚁群算法的图像重建[J]. 计算机科学, 2020, 47(11A): 231-235.
[12] 邓定胜. 一种改进的DBSCAN算法在Spark平台上的应用[J]. 计算机科学, 2020, 47(11A): 425-429.
[13] 丁武, 马媛, 杜诗蕾, 李海辰, 丁公博, 王超. 基于XGBoost算法的多元水文时间序列趋势相似性挖掘[J]. 计算机科学, 2020, 47(11A): 459-463.
[14] 张成伟, 罗凤娥, 代毅. 基于数据挖掘的指定航班计划延误预测方法[J]. 计算机科学, 2020, 47(11A): 464-470.
[15] 陈沛, 郑万波, 刘文奇, 肖敏, 张凌霄. 基于多种模型的云南省农作物主产区域部分气候指标分析与预测[J]. 计算机科学, 2020, 47(11A): 496-503.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .
[2] 施超,谢在鹏,柳晗,吕鑫. 基于稳定匹配的容器部署策略的优化[J]. 计算机科学, 2018, 45(4): 131 -136 .
[3] 战芸娇,魏欧,胡军. 面向DO-178C的襟缝翼控制系统需求的形式化描述[J]. 计算机科学, 2018, 45(4): 196 -202 .
[4] 秦克云,林洪. 决策形式背景属性约简的关系[J]. 计算机科学, 2018, 45(4): 257 -259 .
[5] 崔一辉, 宋伟, 彭智勇, 杨先娣. 基于差分隐私的多源数据关联规则挖掘方法[J]. 计算机科学, 2018, 45(6): 36 -40 .
[6] 张昱, 高克宁, 于戈. 一种融合节点属性信息的社会网络链接预测方法[J]. 计算机科学, 2018, 45(6): 41 -45 .
[7] 冯艳红, 于红, 孙庚, 彭松. 基于非对称多值特征杰卡德系数的高维语义向量差异性度量方法[J]. 计算机科学, 2018, 45(6): 57 -66 .
[8] 吴建霞, 杨永立. 一种降低FBMC-OQAM系统PAPR的算法[J]. 计算机科学, 2018, 45(6): 89 -95 .
[9] 刘景玮, 刘京菊, 陆余良, 杨斌, 朱凯龙. 基于网络攻防博弈模型的最优防御策略选取方法[J]. 计算机科学, 2018, 45(6): 117 -123 .
[10] 成静, 张涛, 王涛, 董占伟. 一种基于图复杂度的移动导航服务回归测试优先方法[J]. 计算机科学, 2018, 45(6): 141 -144 .