基于Nystrm采样和凸NMF的偏好聚类

doi:10.11896/j.issn.1002-137X.2018.01.008

Abstract

Abstract: Large-scale sparse graph data appear in reality heavily,for example,collaborative graph,Laplacian matrix,and so on.Non-negative matrix factorization (NMF) has become a useful tool in data mining,information retrieval and signal processing.How to achieve the data clustering in large-scale data is an important issue.This paper used the two-stage method to realize data clustering.First of all,the Nystrm approximate sampling method is used.The initial profile of data is obtained from large data,and the similar matrix of user-user or movie-movie is obtained.The purpose of doing that is to reduce the original high dimensional space to a low dimensional subspace.Then convex non-negative matrix decomposition of low dimensional similarity matrix is used to get the center of the cluster and indicator.The center of the cluster represents the features of movies or users,and the indicator represents the weight of the features of mo-vies or users.The advantage of two-stage preference clustering method is that the approximation of initial data contour and convex non-negative matrix factorization have better robustness and anti-noise.On the other hand,the data of the subspace are derived from the real matrix,which makes the results of clustering preference have good interpretability.This paper utilized Nystrm method to solve the problem that large-scale data cannot be stored in the memory,saving memory and improving operation efficiency.Lastly,the test on movie data sets which containing 100000 ratings shows the effectiveness of the clustering algorithm.

Key words: Nystrm method,Convex non-negative matrix factorization,Preference clustering,Clustering center,Clustering indicator

YANG Mei-jiao and LIU Jing-lei. Preference Clustering Based on Nystrm Sampling and Convex-NMF[J].Computer Science, 2018, 45(1): 55-61.

References

[1] WANG L,BEZDEK J C,LECKIE C.Selective sampling for approximate clustering of very large data sets[J].International Journal of Intelligent Systems,2008,3(3):313-331.
[2] XU R,II D C W.IEEE,Survey of clustering algorithms[J].IEEE Transactions on Neural Networks,2005,16(3):645-678.
[3] SUN J G,LIU J,ZHAO L Y.Clustering algorithm research[J].Journal of Software,2008,19(1):48-61.(in Chinese) 孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008,19(1):48-61.
[4] DING S F,JIA H J,SHI Z Z.Spectral clustering algorithm for large data spectrum based on adaptive Nystrm sampling[J].Journal of Software,2014,25(9):2037-2049.(in Chinese) 丁世飞,贾洪杰,史忠植.基于自适应Nystrm采样的大数据谱聚类算法[J].软件学报,2014,25(9):2037-2049.
[5] SUN S,ZHAO J,ZHU J.A review of Nystrm methods for large-scale machine learning[J].Information Fusion,2015,26(1):36-48.
[6] WANG S,ZHANG Z.Improving CUR matrix decompositionand the Nystrm approximation via adaptive sampling [J].Journal of Machine Learning Research,2013,14(1):2729-2769.
[7] CHEN X,CAI D.Large scale spectral clustering with landmark-based representation [C]∥AAAI Conferenceon Artificial Intelligence(AAAI 2011).San Francisco,California,USA,2011:55-115.
[8] WANG W W,LI X P,FENG X C.Summarization of sparse subspaces [J].Journal of Acta Automatica Sinica,2015,1(8):1373-1384.(in Chinese) 王卫卫,李小平,冯象初.稀疏子空间聚类综述[J].自动化学报,2015,41(8):1373-1384.
[9] WANG S,LUO L,ZHANG Z.SPSD matrix approximation vis column selection:theories,algorithms,and extensions [J].Journal of Machine Learning Research,2016,17(5):1-49.
[10] WANG S,ZHANG Z,ZHANG T.Towards more efficient SPSD matrix approximation and CUR matrix decomposition [J].Journal of Machine Learning Research,2016,17(210):1-49.
[11] ZHU L,LEI J S,BI Z Q.A soft subspace clustering algorithm based on data stream[J].Journal of Software,2013,24(11):2610-2627.(in Chinese) 朱林,雷景生,毕忠勤.一种基于数据流的软子空间聚类算法[J].软件学报,2013,24(11):2610-2627.
[12] BACH F R,JORDAN M I.Learning spectral clustering [J].Advances in Neural Information Processing Systems,2004,16(2):2006.
[13] DEMPSTER A.Maximum likelihood from incomplete data via theEM algorithm [J].Journal of the Royal Statistical Society,1977,39(1):1-38.
[14] SPIELMAN D A.Spectral graph theory and its applications [C]∥IEEE Symposium on Foundations of Computer Science.2007:29-38.
[15] CAI X Y,DAI G Z,YANG L B.A review of spectral clustering algorithm [J].Journal of Computer Science,2008,35(7):14-18.(in Chinese) 蔡晓妍,戴冠中,杨黎斌.谱聚类算法综述[J].计算机科学,2008,35(7):14-18.
[16] ALZATE C,SUYKENS J A K.Multiway Spectral Clusteringwith Out-of-Sample Extensions through Weighted Kernel PCA [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(2):335-347.
[17] LAUER F,SCHNOOR C.Spectral clustering of linear sub-spaces for motion segmentation [J].IEEE International Confe-rence on Computer Vision,2009,30(2):678-685.
[18] JIA H,DING S,DU M.Approximate normalized cuts without eigen-decomposition [J].Journal of Information Sciences,2016,374(5):135-150.
[19] SHI J,MALIK J.Normalised cuts and image segmenerarion[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2000,22(8):888-905.
[20] DING C,LI T,JORDAN M I.Convex and semi-nonnegative matrix factorizations[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(1):45-55.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Preference Clustering Based on Nystrm Sampling and Convex-NMF

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0