计算机科学 ›› 2021, Vol. 48 ›› Issue (8): 47-52.doi: 10.11896/jsjkx.201000106

• 数据库&大数据&数据科学* • 上一篇    下一篇

基于局部回归融合的多核聚类方法

杜亮1,2, 任鑫1, 张海莹1, 周芃3   

  1. 1 山西大学计算机与信息技术学院 太原030006
    2 山西大学大数据科学与产业研究院 太原030006
    3 安徽大学计算机科学与技术学院 合肥230601
  • 收稿日期:2020-10-18 修回日期:2021-01-29 发布日期:2021-08-10
  • 通讯作者: 周芃(zhoupeng@ahu.edu.cn)
  • 基金资助:
    国家自然科学基金(61976129,61806003)

Multiple Kernel Clustering via Local Regression Integration

DU Liang1,2, REN Xin1, ZHANG Hai-ying1, ZHOU Peng3   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China;
    2 Institute of Big Data Science and Industry,Shanxi University,Taiyuan 030006,China;
    3 School of Computer Science and Technology,Anhui University,Hefei 230601,China
  • Received:2020-10-18 Revised:2021-01-29 Published:2021-08-10
  • About author:DU Liang,born in 1985,Ph.D,is a member of China Computer Federation.His main research interests include machine learning and data mining.(im.duliang@qq.com)ZHOU Peng,born in 1989,Ph.D,is a member of China Computer Federation.His main research interests include data mining and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61976129,61806003).

摘要: 针对现有多核聚类方法较少考虑多核数据局部流形结构以及在多核融合时学习参数过多进而易受多核噪声异常等干扰的问题,文中首先提出了基于局部核回归的聚类方法(CKLR)。该方法通过局部学习来刻画单核数据的流形结构并采用稀疏化的局部核回归系数来进行预测和聚类。文中进一步提出了基于单核局部核回归融合的多核聚类方法(CMKLR)。该方法为每个核矩阵构造对应的稀疏化的局部核回归系数,并采用全局线性加权融合的方式获得了多核数据下的局部流形结构和同样稀疏化的多核局部回归系数。所提方法较好地避免了现有方法的两个缺陷,且该方法仅包含局部邻域大小这一超参数。实验结果表明,所提方法在测试数据集上的聚类性能优于当前的主流多核聚类方法。

关键词: 多核聚类, 局部回归, 局部学习

Abstract: Multiple kernel methods less consider the intrinsic manifold structure of multiple kernel data and estimate the consensus kernel matrix with quadratic number of variables,which makes it vulnerable to the noise and outliers within multiple candidate kernels.This paper first presents the clustering method via kernelized local regression(CKLR).It captures the local structure of kernel data and employs kernel regression on the local region to predict the clustering results.Moreover,this paper further extends it to perform clustering via the multiple kernel local regression(CMKLR).We construct the kernel level local regression sparse coefficient matrix for each candidate kernel,which well characterize the kernel level manifold structure.We then aggregate all the kernel level local regression coefficients via linear weights and generate the consensus sparse local regression coefficient,which largely reduces the number of candidate variables and becomes more robust against noises and outliers within multiple kernel data.Thus,the proposed method CMKLR avoids the above two limitations.It only contains one additional hyper parameter for turning.Extensive experimental results show that the clustering performance of the proposed method on benchmark data set is better than that of 10 state-of-the-art multiple kernel clustering methods.

Key words: Local learning, Local regression, Multiple kernel clustering

中图分类号: 

  • TP181
[1]ZHOU H,RAN H P.A Study of Intuitionistic Fuzzy Similarity Clustering Algorithm Based on Chi-Square Distance[J].Journal of Chongqing University of Technology(Natural Science),2020,34(8):238-246.
[2]KAFAI M,ESHGHI K.CROification:accurate kernel classification with the efficiency of sparse linear SVM[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2019,41(1):34-48.
[3]WANG H M,DU L,ZHOU P,et al.Experimental Design with Multiple Kernels[C]//Proceedings of 15th IEEE International Conference on Data Mining.Atlantic City,USA,2015:419-428.
[4]DU L,ZHOU P,SHI I,et al.Robust multiple kernel k-meansusing L2,1-Norm[C]//Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence.Buenos Aires,Argentina,2015:3476-3482.
[5]KUMAR A,HAL D.A co-training approach for multi-viewspectral clustering[C]//Proceedings of the 28th International Conference on Machine Learning.Washington,USA,2011:393-400.
[6]KUMAR A,RAI P,HAL D.Co-regularized multi-view spectral clustering[C]//Advances in Neural Information Processing Systems.Granada,Spain,2011:1413-1421.
[7]XIA R K,PAN Y,DU L,et al.Robust multi-view spectral clustering via low-rank and sparse decomposition[C]//Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence.Québec,Canada,2014:2149-2155.
[8]LIU X W,DOU Y,YIN J P,et al.Multiple kernel k-Means clustering with matrix-induced regularization[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.Arizona,USA,2016:1888-1894.
[9]LI M M,LIU X W,WANG L,et al.Multiple kernel clustering with local kernel alignment maximization[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence.New York,USA,2016:1704-1709.
[10]LIU X W,ZHOU S H,WANG Y Q,et al.Optimal neighborhood kernel clustering with multiple kernels[C]//Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence.California,USA,2017:2266-2272.
[11]KANG Z,WEN L J,CHEN W Y,et al.Low-rank kernel lear-ning for graph-based clustering[J].Knowledge Based Systems,2019,46(17):510-517.
[12]YANG C,REN Z W,SUN Q S,et al.Joint correntropy metric weighting and block diagonal regularizer for robust multiple kernel subspace clustering[J].Information Sciences,2019,500(17):48-66.
[13]ZHAN K,NIE F P,WANG J,et al.Multiview consensus graph clustering[J].IEEE Transactions on Image Processing,2019,28(3):1261-1270.
[14]ZHOU P,DU L,SHI L,et al.Recovery of Corrupted Multiple Kernels for Clustering[C]//Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence.Buenos Aires,Argentina,2015:4105-4111.
[15]LIU X W,ZHU X Z,LI M M,et al.Late Fusion Incomplete Multi-View Clustering[J].IEEE Transactions on Pattern Ana-lysis and Machine Intelligence,2019,41(10):2410-2423.
[16]ZU Z W,LI Q.Mahalanobis distance fuzzy clustering algorithm based on particle swarm optimization[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2019,31(2):279-384.
[17]CAI D,HE X F.Manifold Adaptive Experimental Design forText Categorization[J].IEEE Transactions on Knowledge and Data Engineering,2012,24(4):707-719.
[18]SUN J,SHEN Z Y,LI H,et al.Clustering via Local Regression[C]//Proceedings of the European Conference Machine Lear-ning and Knowledge Discovery in Databases.Antwerp,Belgium,2008:456-471.
[19]DING Z C,GE H W,ZHOU J.Density peaks clustering based on Kullback Leibler divergence[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2019,31(3):367-374.
[1] 徐晓玲, 金忠, 贲圣兰.
基于标签敏感最大间隔准则的人脸年龄两步估计算法
Facial Age Two-steps Estimation Algorithm Based on Label-sensitive Maximum Margin Criterion
计算机科学, 2018, 45(6): 284-290. https://doi.org/10.11896/j.issn.1002-137X.2018.06.050
[2] 陆江,李云.
基于MapReduce的特征选择并行化研究
MapReduce Based Feature Selection Parallelization
计算机科学, 2015, 42(8): 44-47.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!