计算机科学 ›› 2023, Vol. 50 ›› Issue (10): 59-70.doi: 10.11896/jsjkx.230600010
张喜梅1,3, 解滨1,2,3, 米据生4, 徐童童1,3, 张祎玲1,3
ZHANG Ximei1,3, XIE Bin1,2,3, MI Jusheng4, XU Tongtong1,3, ZHANG Yiling1,3
摘要: 谱聚类算法是建立在图论的基础上,将聚类问题转化为图的划分问题,能识别任意形状的类簇且易于实现,因此比传统聚类算法具有更强的适应性。然而,该算法中常用的距离度量不能同时考虑全局和局部一致性,且易受到噪声影响;聚类结果依赖由输入数据构造的相似度矩阵,且通过特征分解得到松弛划分矩阵和离散化过程的两步独立策略难以得到一个共同最优解。因此,提出一种结合共享近邻和流形距离的自适应谱聚类算法(SNN-MSC),引入一种新的具有指数项和比例因子的流形距离,可以灵活调整同一流形内数据的相似度和不同流形之间数据的相似度之比,并将密度因子纳入流形距离度量中,以消除噪声影响;采用共享近邻重新定义相似度度量,能挖掘数据点之间的空间结构和局部关系;同时,对拉普拉斯矩阵施加秩约束,使相似度矩阵中的连通分量完全等于簇个数,能够在优化求解过程中自适应优化数据相似度矩阵和聚类结构,无须再进行离散化操作。在人工数据集和UCI真实数据集上的对比实验显示,所提算法在多个聚类有效性指标上能体现出更好的性能。
中图分类号:
[1]JAIN A K,MURTY M N,FLYNN P J.Data clustering:a review[J].ACM Computing Surveys,1999,31(3):264-323. [2]LIU R,WANG H,YU X M.Shared-nearest-neighbor-basedclustering by fast search and find of density peaks[J].Information Sciences,2018,450:200-226. [3]CHAKRABARTI S.Data mining for hypertext:A tutorial survey[J].ACM SIGKDD Explorations Newsletter,2000,1(2):1-11. [4]SAMARIA F S,HARTER A C.Parameterisation of a stochastic model for human face identification[C]//Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.Seattle:IEEE,1994:138-142. [5]DUAN X Y,LIU Y N,WANG X B.SDN enabled 5G-VANET:Adaptive vehicle clustering and beamformed transmission for aggregated traffic[J].IEEE Communications Magazine,2017,55(7):120-127. [6]HU Z L,TANG J S,WANG Z M,et al.Deep learning for image-based cancer detection and diagnosis-A survey[J].Pattern Recognition,2018,83:134-149. [7]DOUZAS G,BACAO F,LAST F.Improving imbalanced lear-ning through a heuristic oversampling method based on k-means and SMOTE[J].Information Sciences,2018,465:1-20. [8]LI G B,ZHANG R F,CHEN C,et al.Rotation-invariant Deep Hierarchical Cluster Network for Point Cloud Analysis[J].Journal of Software,2022,33(11):4356-4378. [9]TAO X M,GUO W J,REN C,et al.Density peak clusteringusing global and local consistency adjustable manifold distance[J].Information Sciences,2021,577:769-804. [10]HE J,ZHOU J,WANG H Y,et al.DACA:Distributed adaptive grid decision graph based clustering algorithm[J].Software:Practice and Experience,2022,52(5):1199-1215. [11]MELNYKOV V,WANG Y.Conditional mixture modeling andmodel-based clustering[J].Pattern Recognition,2023,133:108994. [12]LI L S,WAN Z Q,HE H B.Incomplete multi-view clustering with joint partition and graph learning[J].IEEE Transactions on Knowledge and Data Engineering,2021,35(1):589-602. [13]MACQUEEN J B.Some methods for classification and analysis of multivariate observations[C]//Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability.1967:281-297. [14]BEZDEK J C,EHRLICH R,FULL W.FCM:The fuzzy c-means clustering algorithm[J].Computers and Geosciences,1984,10(2/3):191-203. [15]LI Z,TAO L.semi-supervised hierarchical clustering[C]//Proceedings of the 2011 IEEE 11th International Conference on Data Mining(ICDM).NW Washington:IEEE Computer Society,2011:982-991. [16]RODRIGUEZ A,LAIO A.Clustering by fast search and find of density peaks[J].Science,2014,344(6191):1492-1496. [17]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:ana-lysis and an algorithm[C]// 14th International Conference on Neural Information Processing Systems:Natural and Synthetic.2001:849-856. [18]WANG L J,DING S F,JIA H J.An improvement of spectral clustering via message passing and density sensitive similarity[J].IEEE Access,2019,7:101054-101062. [19]LIU P J,LI H Y,WANG T Y,et al.Multi-stage Method for Online Vertical Data Partitioning Based on Spectral Clustering[J].Journal of Software,2023,34(6):2804-2832. [20]DING S F,JIA H J,DU M J,et al.p-Spectral clustering based on neighborhood attribute granulation[C]//Intelligent Information Processing VIII:9th IFIP TC 12 International Conference.Melbourne:pringer International Publishing,2016:50-58. [21]LIU X Y,LI J W,YU H,et al.Adaptive spectral clusteringbased on shared nearest neighbors[J].Journal of Chinese Computer Systems,2011,32(9):1876-1880. [22]ZHANG T,GE H W.Spectral Clustering Based on Density Coefficient and Shared Nearest Neighbors[J].Journal of Chinese Computer Systems,2017,38(8):1829-1833. [23]WANG Y R,DING S F,WANG L J,et al.An improved density-based adaptive p-spectral clustering algorithm[J].International Journal of Machine Learning and Cybernetics,2021,12:1571-1582. [24]FISCHER B,ROTH V,BUHMANN J.Clustering with the connectivity kernel[J].Advances in Neural Information Processing Systems,2003,16:89-96. [25]YANG P,ZHU Q S,HUANG B.Spectral clustering with density sensitive similarity function[J].Knowledge-Based Systems,2011,24(5):621-628. [26]BIAN Z,ISHIBUCHI H,WANG S T.Joint learning of spectral clustering structure and fuzzy similarity matrix of data[J].IEEE Transactions on Fuzzy Systems,2018,27(1):31-44. [27]MONNEY A,ZHAN Y Z,JIANG Z,et al.A multi-kernel me-thod of measuring adaptive similarity for spectral clustering[J].Expert Systems with Applications,2020,159:113570. [28]HEIN M,AUDIBERT J Y,LUXBURG U V.Graph Laplacians and their convergence on random neighborhood graphs[J].Journal of Machine Learning Research,2007,8(9):1325-1370. [29]FAN K.On a theorem of Weyl concerning eigenvalues of linear transformations I[J].Proceedings of the National Academy of Sciences,1949,35(11):652-655. [30]BOYD S.Advances in convex optimization:Interior-point me-thods,cone programming,andapplications[C]//Proceedings of the 41st IEEE Conference on Decision and Control.Las Vegas:IEEE,2002:1-49. [31]WU T F,TSAI P S,HU N T,et al.Combining turning point detection and Dijkstra's algorithm to search the shortest path[J].Advances in Mechanical Engineering,2017,9(2):1-12. |
|