计算机科学 ›› 2019, Vol. 46 ›› Issue (6): 64-68.doi: 10.11896/j.issn.1002-137X.2019.06.008

• 大数据与数据科学* • 上一篇    下一篇

基于流形正则化的多类型关系数据联合聚类方法

黄梦婷, 张灵, 姜文超   

  1. (广东工业大学计算机学院 广州510006)
  • 收稿日期:2018-05-06 发布日期:2019-06-24
  • 通讯作者: 姜文超(1977-),男,博士,讲师,主要研究方向为云计算、高性能计算、分布式系统等,E-mail:june4567@21cn.com
  • 作者简介:黄梦婷(1994-),女,硕士生,CCF会员,主要研究方向为数据挖掘与分析;张 灵(1968-),女,博士,教授,主要研究方向为智能化信息处理、自动化装备、人工智能和计算机视觉等;
  • 基金资助:
    广东省自然科学基金项目(2016A030313703),广东省科技计划项目(2016B030305002,2017B030305003,2017B010124001),广东省产学研合作项目(2017B090901005)资助。

Multi-type Relational Data Co-clustering Approach Based on Manifold Regularization

HUANG Meng-ting, ZHANG Ling, JIANG Wen-chao   

  1. (School of Computers,Guangdong University of Technology,Guangzhou 510006,China)
  • Received:2018-05-06 Published:2019-06-24

摘要: 随着大数据应用的发展,通过非线性流形采样得到的多类型关系数据规模越来越大,数据几何结构更加复杂,异构关系数据变得异常稀疏,导致数据挖掘难度增大且准确率降低。针对上述问题,提出一种基于流形非负矩阵三分解的多类型关系数据联合聚类方法:首先,对于较小规模的实体,根据其自然关系或内容相关性构造关联矩阵,对其分解后得到该类实体的聚类指示矩阵,将其作为非负矩阵三分解的输入;然后,在快速非负矩阵三分解(FNMTF)的基础上加入流形正则化处理,实现数据类型间关系与类型内部关系的联合聚类,进一步提高聚类的准确率。实验表明:在准确率和整体性能方面,流形非负矩阵三分解算法优于传统的基于非负矩阵分解的联合聚类算法。

关键词: 多类型关系数据, 流形正则化, 非负矩阵分解, 关联矩阵

Abstract: With the development of big data applications,the size of multi-type relational data sampled from nonlinear manifolds is getting larger.The data geometric structure is more complicated,and the heterogeneous relational data are becoming extremely sparse.As a result,data mining becomes more difficult and less accurate.In order to solve this problem,this paper proposed a manifold nonnegative matrix tri-factorization(MNMTF) approach for multi-type relational data co-clustering.First of all,the correlation matrix is constructed with the natural relationship or content relevance of smaller-scale entities and it is decomposed into indicating matrix.The indicating matrix is used as the input of nonnegative matrix tri-factorization.Then,the manifold regularization is added on the basis of fast nonnegative matrix tri-factorization(FNMTF) to simultaneously cluster data inter-type relationships and intra-type relationships,improving the accuracy of clustering.Experiments show that the accuracy and performance of MNMTF algorithm are superior to the traditional co-clustering algorithms based on nonnegative matrix factorization.

Key words: Multi-type relational data, Manifold regularization, Nonnegative matrix factorization, Correlation matrix

中图分类号: 

  • TP391
[1]ROWEIS S T,SAUL L K.Nonlinear dimensionality reduction by locally linear embedding[J].Science,2000,290(5500):2323-2326.
[2]BELKIN M,NIYOGI P.Laplacian eigenmaps for dimensionality reduction and data representation [J].Neural Computation,2003,15(6):1373-1396.
[3]AILEM M,ROLE F,NADIF M.Co-clustering document-term matrices by direct maximization of graph modularity[C]∥ACM International on Conference on Information and Knowledge Management.New York:ACM Press,2015:1807-1810.
[4]HONDA K,TANAKA D,NOTSU A.Incremental algorithms for fuzzy co-clustering of very large cooccurrence matrix[C]∥IEEE International Conference on Fuzzy Systems.Piscataway:IEEE Press,2014:2494-2499.
[5]LEE D D,SEUNG H S.Learning the parts of objects with nonnegative matrix factorization[J].Nature,1999,401(21):788-791.
[6]LEE D D,SEUNG H S.Algorithms for non-negative matrix factorization[C]∥Neural Information Processing Systems.New York:NIPC Press 2000:535-541.
[7]DING C,HE X,SIMON H D,et al.On the equivalence of nonnegative matrix factorization and spectral clustering[C]∥SIAM International Conference on Data Mining.Philadelphia:SIAM Press,2005:606-610.
[8]DING C,LI T,PENG W,et al.Orthogonal nonnegative matrix tri-factorizations for clustering[C]∥ACM SIGKDD Internatio-nal Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2006:126-135.
[9]LI Z,WU X.Weighted nonnegative matrix tri-factorization for co-clustering[C]∥IEEE International Conference on TOOLS with Artificial Intelligence.Piscataway:IEEE Press,2011:811-816.
[10]BUONO N D,PIO G.Non-negative Matrix Tri-Factorization for co-clustering:An analysis of the block matrix[J].Information Sciences,2015,301(20):13-26.
[11]GU Q,ZHOU J.Co-clustering on manifolds[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2009:359-368.
[12]WANG S,HUANG A.Penalized nonnegative matrix tri-factorization for co-clustering[J].Expert Systems with Applications,2017,78(C):64-73.
[13]WANG S,GUO W.Robust co-clustering via dual local learning and high-order matrix factorization[J].Knowledge-Based Systems,2017,138(15):176-187.
[14]WANG H,NIE F,HUANG H,et al.Fast nonnegative matrix tri-factorization for large-scale data co-clustering[C]∥International Joint Conference on Artificial Intelligence.Menlo Park:AAAI Press,2011:1553-1558.
[15]SHEN G,YANG W,WANG W,et al.Large-scale heteroge-neous data co-clustering based on nonnegative matrix factorization[J].Journal of Computer Research and Development,2016,53(2):459-466.(in Chinese)
申国伟,杨武,王巍,等.基于非负矩阵分解的大规模异构数据联合聚类[J].计算机研究与发展,2016,53(2):459-466.
[1] 康林瑶, 唐兵, 夏艳敏, 张黎. 基于GPU加速和非负矩阵分解的并行协同过滤推荐算法[J]. 计算机科学, 2019, 46(8): 106-110.
[2] 何孝文, 胡一飞, 王海平, 陈默. 在线学习非负矩阵分解[J]. 计算机科学, 2019, 46(6A): 473-477.
[3] 贾旭, 孙福明, 李豪杰, 曹玉东. 基于有监督双正则NMF的静脉识别算法[J]. 计算机科学, 2018, 45(8): 283-287.
[4] 郑红,邓文轩,邓晓,卢兴见. 基于矩阵的工作流逻辑网模型的化简及验证[J]. 计算机科学, 2018, 45(7): 307-314.
[5] 于晓,聂秀山,马林元,尹义龙. 基于短空时变化的鲁棒视频哈希算法[J]. 计算机科学, 2018, 45(2): 84-89.
[6] 邹丽, 蔡希彪, 孙静, 孙福明. 基于双图正则的半监督NMF混合像元解混算法[J]. 计算机科学, 2018, 45(12): 251-254,278.
[7] 杨美姣,刘惊雷. 基于Nystrm采样和凸NMF的偏好聚类[J]. 计算机科学, 2018, 45(1): 55-61, 78.
[8] 李锋,万小强. 基于关联矩阵的短信自动分类[J]. 计算机科学, 2017, 44(Z6): 428-432.
[9] 闫林,高伟,闫硕. 数据合并的结构粒化方法与矩阵计算[J]. 计算机科学, 2017, 44(9): 261-265, 299.
[10] 李鹏,李英乐,王凯,何赞园,李星,常振超. 基于交互行为和连接分析的社交网络社团检测[J]. 计算机科学, 2017, 44(7): 197-202.
[11] 孙静,蔡希彪,姜小燕,孙福明. 基于图正则化和稀疏约束的增量型非负矩阵分解[J]. 计算机科学, 2017, 44(6): 298-305.
[12] 唐兵,Laurent BOBELIN,贺海武. 基于MPI和OpenMP混合编程的非负矩阵分解并行算法[J]. 计算机科学, 2017, 44(3): 51-54.
[13] 姜小燕,孙福明,李豪杰. 基于图正则化和稀疏约束的半监督非负矩阵分解[J]. 计算机科学, 2016, 43(7): 77-82, 105.
[14] 李孟杰,谢强,丁秋林. 基于正交非负矩阵分解的K-means聚类算法研究[J]. 计算机科学, 2016, 43(5): 204-208.
[15] 梁秋霞,何光辉,陈如丽,楚建浦. 基于非负张量分解的人脸识别算法研究[J]. 计算机科学, 2016, 43(10): 312-316.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[3] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[4] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105, 130 .
[5] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111, 142 .
[6] 王振武,吕小华,韩晓辉. 基于四叉树分割的地形LOD技术综述[J]. 计算机科学, 2018, 45(4): 34 -45 .
[7] 廖星,袁景凌,陈旻骋. 一种自适应权重的并行PSO快速装箱算法[J]. 计算机科学, 2018, 45(3): 231 -234, 273 .
[8] 杨羽琦,章国安,金喜龙. 车载自组织网络中基于车辆密度的双簇头路由协议[J]. 计算机科学, 2018, 45(4): 126 -130 .
[9] 瞿中,赵从梅. 一种抗遮挡的自适应尺度目标跟踪算法[J]. 计算机科学, 2018, 45(4): 296 -300 .
[10] 朱淑芹,王文宏,李俊青. 针对基于感知器模型的混沌图像加密算法的选择明文攻击[J]. 计算机科学, 2018, 45(4): 178 -181, 189 .