计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 64-68.doi: 10.11896/j.issn.1002-137X.2019.06.008

• 大数据与数据科学* • 上一篇    下一篇

基于流形正则化的多类型关系数据联合聚类方法

黄梦婷, 张灵, 姜文超   

  1. (广东工业大学计算机学院 广州510006)
  • 收稿日期:2018-05-06 发布日期:2019-06-24
  • 通讯作者: 姜文超(1977-),男,博士,讲师,主要研究方向为云计算、高性能计算、分布式系统等,E-mail:june4567@21cn.com
  • 作者简介:黄梦婷(1994-),女,硕士生,CCF会员,主要研究方向为数据挖掘与分析;张 灵(1968-),女,博士,教授,主要研究方向为智能化信息处理、自动化装备、人工智能和计算机视觉等;
  • 基金资助:
    广东省自然科学基金项目(2016A030313703),广东省科技计划项目(2016B030305002,2017B030305003,2017B010124001),广东省产学研合作项目(2017B090901005)资助。

Multi-type Relational Data Co-clustering Approach Based on Manifold Regularization

HUANG Meng-ting, ZHANG Ling, JIANG Wen-chao   

  1. (School of Computers,Guangdong University of Technology,Guangzhou 510006,China)
  • Received:2018-05-06 Published:2019-06-24

摘要: 随着大数据应用的发展,通过非线性流形采样得到的多类型关系数据规模越来越大,数据几何结构更加复杂,异构关系数据变得异常稀疏,导致数据挖掘难度增大且准确率降低。针对上述问题,提出一种基于流形非负矩阵三分解的多类型关系数据联合聚类方法:首先,对于较小规模的实体,根据其自然关系或内容相关性构造关联矩阵,对其分解后得到该类实体的聚类指示矩阵,将其作为非负矩阵三分解的输入;然后,在快速非负矩阵三分解(FNMTF)的基础上加入流形正则化处理,实现数据类型间关系与类型内部关系的联合聚类,进一步提高聚类的准确率。实验表明:在准确率和整体性能方面,流形非负矩阵三分解算法优于传统的基于非负矩阵分解的联合聚类算法。

关键词: 多类型关系数据, 非负矩阵分解, 关联矩阵, 流形正则化

Abstract: With the development of big data applications,the size of multi-type relational data sampled from nonlinear manifolds is getting larger.The data geometric structure is more complicated,and the heterogeneous relational data are becoming extremely sparse.As a result,data mining becomes more difficult and less accurate.In order to solve this problem,this paper proposed a manifold nonnegative matrix tri-factorization(MNMTF) approach for multi-type relational data co-clustering.First of all,the correlation matrix is constructed with the natural relationship or content relevance of smaller-scale entities and it is decomposed into indicating matrix.The indicating matrix is used as the input of nonnegative matrix tri-factorization.Then,the manifold regularization is added on the basis of fast nonnegative matrix tri-factorization(FNMTF) to simultaneously cluster data inter-type relationships and intra-type relationships,improving the accuracy of clustering.Experiments show that the accuracy and performance of MNMTF algorithm are superior to the traditional co-clustering algorithms based on nonnegative matrix factorization.

Key words: Correlation matrix, Manifold regularization, Multi-type relational data, Nonnegative matrix factorization

中图分类号: 

  • TP391
[1]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
[2]BELKIN M,NIYOGI P.Laplacian eigenmaps for dimensionality reduction and data representation [J].Neural Computation,2003,15(6):1373-1396.
[3]AILEM M,ROLE F,NADIF M.Co-clustering document-term matrices by direct maximization of graph modularity[C]∥ACM International on Conference on Information and Knowledge Management.New York:ACM Press,2015:1807-1810.
[4]HONDA K,TANAKA D,NOTSU A.Incremental algorithms for fuzzy co-clustering of very large cooccurrence matrix[C]∥IEEE International Conference on Fuzzy Systems.Piscataway:IEEE Press,2014:2494-2499.
[5]LEE D D,SEUNG H S.Learning the parts of objects with nonnegative matrix factorization[J].Nature,1999,401(21):788-791.
[6]LEE D D,SEUNG H S.Algorithms for non-negative matrix factorization[C]∥Neural Information Processing Systems.New York:NIPC Press 2000:535-541.
[7]DING C,HE X,SIMON H D,et al.On the equivalence of nonnegative matrix factorization and spectral clustering[C]∥SIAM International Conference on Data Mining.Philadelphia:SIAM Press,2005:606-610.
[8]DING C,LI T,PENG W,et al.Orthogonal nonnegative matrix tri-factorizations for clustering[C]∥ACM SIGKDD Internatio-nal Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2006:126-135.
[9]LI Z,WU X.Weighted nonnegative matrix tri-factorization for co-clustering[C]∥IEEE International Conference on TOOLS with Artificial Intelligence.Piscataway:IEEE Press,2011:811-816.
[10]BUONO N D,PIO G.Non-negative Matrix Tri-Factorization for co-clustering:An analysis of the block matrix[J].Information Sciences,2015,301(20):13-26.
[11]GU Q,ZHOU J.Co-clustering on manifolds[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2009:359-368.
[12]WANG S,HUANG A.Penalized nonnegative matrix tri-factorization for co-clustering[J].Expert Systems with Applications,2017,78(C):64-73.
[13]WANG S,GUO W.Robust co-clustering via dual local learning and high-order matrix factorization[J].Knowledge-Based Systems,2017,138(15):176-187.
[14]WANG H,NIE F,HUANG H,et al.Fast nonnegative matrix tri-factorization for large-scale data co-clustering[C]∥International Joint Conference on Artificial Intelligence.Menlo Park:AAAI Press,2011:1553-1558.
[15]SHEN G,YANG W,WANG W,et al.Large-scale heteroge-neous data co-clustering based on nonnegative matrix factorization[J].Journal of Computer Research and Development,2016,53(2):459-466.(in Chinese)
申国伟,杨武,王巍,等.基于非负矩阵分解的大规模异构数据联合聚类[J].计算机研究与发展,2016,53(2):459-466.
[1] 官铮, 邓扬琳, 聂仁灿.
光谱重建约束非负矩阵分解的高光谱与全色图像融合
Non-negative Matrix Factorization Based on Spectral Reconstruction Constraint for Hyperspectral and Panchromatic Image Fusion
计算机科学, 2021, 48(9): 153-159. https://doi.org/10.11896/jsjkx.200900054
[2] 段菲, 王慧敏, 张超.
面向数据表示的Cauchy非负矩阵分解
Cauchy Non-negative Matrix Factorization for Data Representation
计算机科学, 2021, 48(6): 96-102. https://doi.org/10.11896/jsjkx.200700195
[3] 李雨蓉, 刘杰, 刘亚林, 龚春叶, 王勇.
面向语音分离的深层转导式非负矩阵分解并行算法
Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation
计算机科学, 2020, 47(8): 49-55. https://doi.org/10.11896/jsjkx.190900202
[4] 李向利, 贾梦雪.
基于预处理的超图非负矩阵分解算法
Nonnegative Matrix Factorization Algorithm with Hypergraph Based on Per-treatments
计算机科学, 2020, 47(7): 71-77. https://doi.org/10.11896/jsjkx.200200106
[5] 王丽星, 曹付元.
基于Huber损失的非负矩阵分解算法
Huber Loss Based Nonnegative Matrix Factorization Algorithm
计算机科学, 2020, 47(11): 80-87. https://doi.org/10.11896/jsjkx.190900144
[6] 周昌, 李向利, 李俏霖, 朱丹丹, 陈世莲, 蒋丽榕.
基于余弦相似度的稀疏非负矩阵分解算法
Sparse Non-negative Matrix Factorization Algorithm Based on Cosine Similarity
计算机科学, 2020, 47(10): 108-113. https://doi.org/10.11896/jsjkx.190700112
[7] 康林瑶, 唐兵, 夏艳敏, 张黎.
基于GPU加速和非负矩阵分解的并行协同过滤推荐算法
GPU-accelerated Non-negative Matrix Factorization-based Parallel Collaborative Filtering Recommendation Algorithm
计算机科学, 2019, 46(8): 106-110. https://doi.org/10.11896/j.issn.1002-137X.2019.08.017
[8] 何孝文, 胡一飞, 王海平, 陈默.
在线学习非负矩阵分解
Online Learning Nonnegative Matrix Factorization
计算机科学, 2019, 46(6A): 473-477.
[9] 黄梦婷, 张灵, 姜文超.
基于非负矩阵分解的短文本特征扩展与分类
Short Text Feature Expansion and Classification Based on Non-negative Matrix Factorization
计算机科学, 2019, 46(12): 69-73. https://doi.org/10.11896/jsjkx.190400107
[10] 贾旭, 孙福明, 李豪杰, 曹玉东.
基于有监督双正则NMF的静脉识别算法
Vein Recognition Algorithm Based on Supervised NMF with Two Regularization Terms
计算机科学, 2018, 45(8): 283-287. https://doi.org/10.11896/j.issn.1002-137X.2018.08.051
[11] 郑红,邓文轩,邓晓,卢兴见.
基于矩阵的工作流逻辑网模型的化简及验证
Simplification and Verification of Matrix-based Workflow Logic Net Model
计算机科学, 2018, 45(7): 307-314. https://doi.org/10.11896/j.issn.1002-137X.2018.07.052
[12] 于晓,聂秀山,马林元,尹义龙.
基于短空时变化的鲁棒视频哈希算法
Robust Video Hashing Algorithm Based on Short-term Spatial Variations
计算机科学, 2018, 45(2): 84-89. https://doi.org/10.11896/j.issn.1002-137X.2018.02.014
[13] 邹丽, 蔡希彪, 孙静, 孙福明.
基于双图正则的半监督NMF混合像元解混算法
Hyperspectral Unmixing Algorithm Based on Dual Graph-regularized Semi-supervised NMF
计算机科学, 2018, 45(12): 251-254. https://doi.org/10.11896/j.issn.1002-137X.2018.12.041
[14] 杨美姣,刘惊雷.
基于Nystrm采样和凸NMF的偏好聚类
Preference Clustering Based on Nystrm Sampling and Convex-NMF
计算机科学, 2018, 45(1): 55-61. https://doi.org/10.11896/j.issn.1002-137X.2018.01.008
[15] 李锋,万小强.
基于关联矩阵的短信自动分类
SMS Automatic Classification Based on Relational Matrix
计算机科学, 2017, 44(Z6): 428-432. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.096
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!