计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 79-86.doi: 10.11896/jsjkx.190400123
陈俊芬,张明,赵佳成
CHEN Jun-fen,ZHANG Ming,ZHAO Jia-cheng
摘要: 机器学习的无监督聚类算法已被广泛应用于各种目标识别任务。基于密度峰值的快速搜索聚类算法(DPC)能快速有效地确定聚类中心点和类个数,但在处理复杂分布形状的数据和高维图像数据时仍存在聚类中心点不容易确定、类数偏少等问题。为了提高其处理复杂高维数据的鲁棒性,文中提出了一种基于学习特征表示的密度峰值快速搜索聚类算法(AE-MDPC)。该算法采用无监督的自动编码器(AutoEncoder)学出数据的最优特征表示,结合能刻画数据全局一致性的流形相似性,提高了同类数据间的紧致性和不同类数据间的分离性,促使潜在类中心点的密度值成为局部最大。在4个人工数据集和4个真实图像数据集上将AE-MDPC与经典的K-means,DBSCAN,DPC算法以及结合了PCA的DPC算法进行比较。实验结果表明,在外部评价指标聚类精度、内部评价指标调整互信息和调整兰德指数上,AE-MDPC的聚类性能优于对比算法,而且提供了更好的可视化性能。总之,基于特征表示学习且结合流形距离的AE-MDPC算法能有效地处理复杂流形数据和高维图像数据。
中图分类号:
[1]QUEEN J M.Some methods for classification and analysis of multivariate observations[C]∥Proc of the fifth Berkeley symposium on mathematical statistics and probability.Oakland:Lucien Marie Le Cam,1967:281-297. [2]ISMKHAN H.I-k-means-+:An Iterative Clustering Algorithm Based on an Enhanced Version of the k-means[J].Pattern Re-cognition,2018,79:402-413. [3]JIA R Y,LI Y G.K-means algorithm for self-determination of cluster number and initial center[J].Computer Engineering and Application,2018,54 (7):152-158. [4]BIRANT D,KUT A.ST-DBSCAN:An algorithm for clustering spatial-temporal data[J].Data & Knowledge Engineering,2007,60(1):208-221. [5]HOU J,GAO H J,LI X L.DSets-DBSCAN:A Parameter-Free Clustering Algorithm[J].IEEE Transactions on Image Proces-sing,2016,25(7):3182-3193. [6]TRAN T N,DRAB K,DASZYKOWSKI M.Revised DBSCAN algorithm to cluster data with dense adjacent clusters[J].Chemometrics & Intelligent Laboratory Systems,2013,120(2):92-96. [7]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496. [8]XIE J Y,GAO H C,XIE W X.K Nearest Neighbor Optimized Density Peak Fast Search Clustering Algorithms[J].Chinese Science:Information Science,2016,46 (2):258-280. [9]MEHMOOD R,BIE R,DAWOOD H,et al.Fuzzy Clustering by Fast Search and Find of Density Peaks[J].Personal & Ubiquitous Computing,2016,20(5):785-793. [10]LI C Y,DING G Y,WANG D K.Clustering by Fast Search and Find of Density Peaks with Data Field[J].Chinese Journal of Electronics,2016,25(3):397-402. [11]XU J,WANG G Y,DENG W H.DenPEHC:Density Peak based Efficient Hierarchical Clustering[J].Information Sciences,2016,373(12):200-218. [12]LU Y H,XIA C.Optimal K-Nearest Neighbor and Local Density Clustering Algorithms for Uncertain Data[J].Control and Decision-making,2016,31(3):541-546. [13]WANG P F,YANG Y W,KE Y Q.Research on Optimization of fast clustering algorithm for peak density[J].Computer Engineering and Science,2018,40(8):1503-1510. [14]XIE J Y,QU Y N.K-medoids clustering algorithm for initial center of peak density optimization [J].Computer Science and Exploration,2016,10(2):230-247. [15]WANG S L,WANG D K,LI C Y,et al.Clustering by fast search and find of density peaks with data field[J].Chinese Journal of Electronics,2016,25(3):397-402. [16]XIE J Y,GAO H C,XIE W X,et al.Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors[J].Information Sciences,2016,354(C):19-40. [17]DU M J,DING S F,JIA H J.Study on density peaks clustering based on k-nearest neighbors and principal component analysis[J].Knowledge-Based Systems,2016,99:135-145. [18]LIU R,WANG H,YU X M.Shared-nearest-neighbor-based Clustering by Fast Search and Find of Density Peaks[J].Information Sciences,2018,450:200-226. [19]GOTTUMUKKAL R.An improved face recognition technique based on modular PCA approach[J].Pattern Recognit Lett,2004,25(4):429-436. [20]KE Y,SUKTHANKAR R.PCA-SIFT:a more distinctive representation for local image descriptors[C]∥Proc of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington:IEEE,2004. [21]ZHANG J Q,ZHANG H Y.Fast search clustering algorithm for density peak based on manifold distance[J].ComputerKnow-ledge and Technology,2017,13(2):179-182. [22]YANG H,FU Y,FAN D.The effect of noise characteristics on the internal validity of clustering[J].Computer Science,2018,45(7):22-30. [23]TENENBAUM J B,SILVA V D E,LANGFORD J C.A global geometric framework for nonlinear dimensionality reduction[J].Science,2000,290(5500):2319-2323. [24]RUMELHART D E,HINTON G E,WILLIAMS R J.Learning representations by back-propagating errors[J].Nature,1986,323(6088):533-536. [25]YOSHUA B,LAMBLIN P,POPOVICI D,et al.Greedy layer- wise training of deep networks[C]∥Advances in Neural Information Processing Systems (NIPS’06).2006:153-160. [26]VINH N X,EPPS J,BAILEY J.Bibliometrics:Information theo- retic measures for clusterings comparison[C]∥Proc of the International Conference on Machine Learning.New York:ACM,2010,2837-2854. [27]YANN L C,L′EON B,YOSHUA B,et al.Gradient-based lear- ning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324. |
[1] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于分层抽样优化的面向异构客户端的联邦学习 Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients 计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263 |
[2] | 柴慧敏, 张勇, 方敏. 基于特征相似度聚类的空中目标分群方法 Aerial Target Grouping Method Based on Feature Similarity Clustering 计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203 |
[3] | 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩. 基于DBSCAN聚类的集群联邦学习方法 Clustered Federated Learning Methods Based on DBSCAN Clustering 计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059 |
[4] | 郁舒昊, 周辉, 叶春杨, 王太正. SDFA:基于多特征融合的船舶轨迹聚类方法研究 SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion 计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253 |
[5] | 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞. 基于密度敏感距离和模糊划分的改进FCM算法 FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition 计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042 |
[6] | 陈景年. 一种适于多分类问题的支持向量机加速方法 Acceleration of SVM for Multi-class Classification 计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149 |
[7] | 刘丽, 李仁发. 医疗CPS协作网络控制策略优化 Control Strategy Optimization of Medical CPS Cooperative Network 计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230 |
[8] | 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳. 三维城市场景中的小物体检测 Small Object Detection in 3D Urban Scenes 计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174 |
[9] | 邢云冰, 龙广玉, 胡春雨, 忽丽莎. 基于SVM的类别增量人体活动识别方法 Human Activity Recognition Method Based on Class Increment SVM 计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024 |
[10] | 朱哲清, 耿海军, 钱宇华. 面向化学结构的线段聚类算法 Line-Segment Clustering Algorithm for Chemical Structure 计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131 |
[11] | 张宇姣, 黄锐, 张福泉, 隋栋, 张虎. 基于菌群优化的近邻传播聚类算法研究 Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization 计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218 |
[12] | 左园林, 龚月姣, 陈伟能. 成本受限条件下的社交网络影响最大化方法 Budget-aware Influence Maximization in Social Networks 计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228 |
[13] | 韩洁, 陈俊芬, 李艳, 湛泽聪. 基于自注意力的自监督深度聚类算法 Self-supervised Deep Clustering Algorithm Based on Self-attention 计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001 |
[14] | 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞. 基于节点相似性和网络嵌入的复杂网络社区发现算法 Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding 计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009 |
[15] | 蒲实, 赵卫东. 一种面向动态科研网络的社区检测算法 Community Detection Algorithm for Dynamic Academic Network 计算机科学, 2022, 49(1): 89-94. https://doi.org/10.11896/jsjkx.210100023 |
|