计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 80-87.doi: 10.11896/jsjkx.190900144

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于Huber损失的非负矩阵分解算法

王丽星1, 曹付元1,2   

  1. 1 山西大学计算机与信息技术学院 太原 030006
    2 山西大学计算智能与中文信息处理教育部重点实验室 太原 030006
  • 收稿日期:2019-09-23 修回日期:2020-02-08 出版日期:2020-11-15 发布日期:2020-11-05
  • 通讯作者: 曹付元(cfy@sxu.edu.cn)
  • 作者简介:1608689768@qq.com
  • 基金资助:
    国家自然科学基金(61573229,61976128);山西省重点研发计划项目(201803D31022);山西省留学基金项目(2016-003);山西省留学基金择优资助项目(2016-001)

Huber Loss Based Nonnegative Matrix Factorization Algorithm

WANG Li-xing1, CAO Fu-yuan1,2   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory of Computational Intelligence and Chinese Information Processing (Shanxi University),Ministry of Education,Taiyuan 030006,China
  • Received:2019-09-23 Revised:2020-02-08 Online:2020-11-15 Published:2020-11-05
  • About author:WANG Li-xing,born in 1992,postgradu-ate.Her main research interests include Subs pace Learning and NMF.
    CAO Fu-yuan,born in 1974,professor,is a member of China Computer Federatio.His main research interests include subspace learning and NMF.
  • Supported by:
    This work was supported by the National Natural Science Foundation (61573229,61976128), Shanxi Provincial Key Research and Development Program (201803D31022),Shanxi Scholarship Fund Project(2016-003) and Shanxi Scholarship Fund Selection Project (2016-001).

摘要: 非负矩阵分解( Nonnegative Matrix Factorization)算法能为原始数据找到非负的、线性的矩阵表示且保留了数据的本质特征,已被成功应用于多个领域。经典的NMF算法及其变体算法大部分使用均方误差函数来度量重建误差,在许多任务中已经显示出其有效性,但它在处理含有噪声的数据时仍然面临一些困难。Huber损失函数对较小的残差执行的惩罚与均方误差损失函数相同,对较大的残差执行的惩罚是线性增长的,因此与均方误差损失函数相比,Huber损失函数具有更强的鲁棒性;已有研究证明L2,1范数稀疏正则项在机器学习的分类和聚类模型中具有特征选择作用。结合两者的优点,文中提出了一种基于Huber损失函数且融入L2,1范数正则项的非负矩阵分解聚类模型,并给出了基于投影梯度更新规则的优化过程。在多组数据集上将所提算法与经典的多种聚类算法进行对比,实验结果验证了所提算法的有效性。

关键词: L2,1范数, Huber损失函数, 非负矩阵分解, 投影梯度法

Abstract: Non-negative matrix factorization (NMF) algorithm can find a non-negative and linear matrix representation and retains the essential characteristics of the original data,it has been successfully applied to many fields.The classical NMF algorithm and its variant algorithms mostly use the mean square error function to measure the reconstruction error,which has been shown to be effective in many tasks,but it still faces some difficulties in dealing with noise-containing data.The Huber loss function performs the same penalty for the smaller residual as the mean square error loss function,and the penalty for the larger residual is linearly grown,so the Huber loss function is more robust than the mean square error loss function.It has been proved that the L2,1 norm sparse regularization term is a feature selection function in the classification and clustering model of machine learning.Therefore,combining the advantages of the two,a non-negative matrix factorization clustering model based on Huber loss function and incorporating L2,1 norm regularization term is proposed,and an effective optimization procedure based on projected gradient method to update variables is given.Compared with the classical NMF multi-clustering algorithm on multiple sets of datasets,the experimental results show the effectiveness of the proposed algorithm.

Key words: L2,1 norm, Huber loss function, Nonnegative matrix factorization, Projected gradient method

中图分类号: 

  • TP3-05
[1] LU H T,FU Z Y,SHU X.Non-negative and sparse spectralclustering [J].Pattern Recognition,2014,47(1):418-426.
[2] LEE D D,SEUNG H S.Learning the parts of objects by non-negative matrix factorization [J].Nature,1999,401:788-791.
[3] LEE D D,SEUNG H S.Algorithms for non-negative matrix factorization[C]//NIPS.2000:535-541.
[4] LI M J,XIE Q,DING Q L.Orthogonal Non-negative Matrix Factorization for K-means Clustering [J].Computer Science,2016,43(5):204-208.
[5] LIN C J.Projected gradient methods for non-negative matrixfactorization [J].Neural Computation,2007,19(10):2756-2779.
[6] HOYER P O.Non-negative matrix factorization with sparseness constraints [J].Journal of Machine Learning Research,2004,5(9):1457-1469.
[7] CAI D,HE H,HAN J.Graph regularized nonnegative matrixfactorization for data representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2011,33(8):1548-1560.
[8] JIANG W,LI H,YU X,et al.Graph Regularized Non-negative Matrix Factorization with Sparseness Constraints [J].Computer Science,2013,40(1):218-220,256.
[9] LIU H,WU Z,LI X.Constrained nonnegative matrix factorization for image representation [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2012,34(7):1299-1311.
[10] KONG D,DING C,HUANG H.Robust nonnegative matrix factorization using L21-norm[C]//Proceedings of the 20th ACM CIKM.2011:673-682.
[11] DU L,LI X,SHEN Y D.Robust nonnegative matrix factorization via half-quadratic minimization[C]//IEEE.ICDM,2012:201-210.
[12] YANG S,HOU C,ZHANG C,et al.Robust non-negative matrix factorization via joint sparse and graph regularization[C]//International Joint Conference on Neural Networks.2013:1-5.
[13] KONG D,DING C,HUANG H.Robust nonnegative matrix factorization using l21-norm[C]//The 20th ACM International Conference on Information and Knowledge Management.2011:673-682.
[14] NIE F P,HUANG H,CAI X,et al.Efficient and robust feature selection via L2,1-norms minimization [C]//Proceedings of International Conference on Neural Information Processing Systems.British,ACM,2010:1813-1821.
[15] CALAMAI P H,MORÈ J J.Projected gradient methods for linearly constrained problems [J].Mathematical Programming,1987,39(1):93-116.
[16] HUBER P J.Robust Statistics (second edition) [M].NewJersey:John Wiley & Sons,2009:1-5.
[17] ANDREW S,TSOCHANTARIDIS I T.HOFMANN. Supportvector machines for multiple-instance learning[C]//Advances in Neural Information Processing Systems.USA:The MIT Press,2003:577-584.
[18] TOLIC D,ANTULOV F N,KOPRVIA I.A nonlinear orthogonal non-negative matrix factorization approach to subspace clustering [J].Pattern Recognition,2018,82(10):40-55.
[19] NIE F P,WANG X Q,HUANG H.Clustering and projectedclustering with adaptive neighbors[C]//ACM SIGKDD Conference on Knowledge Discovery and Data Mining.New York,ACM,2014:977-986.
[20] DUA D,GRAFF C.UCI Machine Learning Repository[OL].http://archive.ics.uci.edu/ml.
[1] 官铮, 邓扬琳, 聂仁灿.
光谱重建约束非负矩阵分解的高光谱与全色图像融合
Non-negative Matrix Factorization Based on Spectral Reconstruction Constraint for Hyperspectral and Panchromatic Image Fusion
计算机科学, 2021, 48(9): 153-159. https://doi.org/10.11896/jsjkx.200900054
[2] 段菲, 王慧敏, 张超.
面向数据表示的Cauchy非负矩阵分解
Cauchy Non-negative Matrix Factorization for Data Representation
计算机科学, 2021, 48(6): 96-102. https://doi.org/10.11896/jsjkx.200700195
[3] 李雨蓉, 刘杰, 刘亚林, 龚春叶, 王勇.
面向语音分离的深层转导式非负矩阵分解并行算法
Parallel Algorithm of Deep Transductive Non-negative Matrix Factorization for Speech Separation
计算机科学, 2020, 47(8): 49-55. https://doi.org/10.11896/jsjkx.190900202
[4] 李向利, 贾梦雪.
基于预处理的超图非负矩阵分解算法
Nonnegative Matrix Factorization Algorithm with Hypergraph Based on Per-treatments
计算机科学, 2020, 47(7): 71-77. https://doi.org/10.11896/jsjkx.200200106
[5] 周昌, 李向利, 李俏霖, 朱丹丹, 陈世莲, 蒋丽榕.
基于余弦相似度的稀疏非负矩阵分解算法
Sparse Non-negative Matrix Factorization Algorithm Based on Cosine Similarity
计算机科学, 2020, 47(10): 108-113. https://doi.org/10.11896/jsjkx.190700112
[6] 康林瑶, 唐兵, 夏艳敏, 张黎.
基于GPU加速和非负矩阵分解的并行协同过滤推荐算法
GPU-accelerated Non-negative Matrix Factorization-based Parallel Collaborative Filtering Recommendation Algorithm
计算机科学, 2019, 46(8): 106-110. https://doi.org/10.11896/j.issn.1002-137X.2019.08.017
[7] 何孝文, 胡一飞, 王海平, 陈默.
在线学习非负矩阵分解
Online Learning Nonnegative Matrix Factorization
计算机科学, 2019, 46(6A): 473-477.
[8] 黄梦婷, 张灵, 姜文超.
基于流形正则化的多类型关系数据联合聚类方法
Multi-type Relational Data Co-clustering Approach Based on Manifold Regularization
计算机科学, 2019, 46(6): 64-68. https://doi.org/10.11896/j.issn.1002-137X.2019.06.008
[9] 黄梦婷, 张灵, 姜文超.
基于非负矩阵分解的短文本特征扩展与分类
Short Text Feature Expansion and Classification Based on Non-negative Matrix Factorization
计算机科学, 2019, 46(12): 69-73. https://doi.org/10.11896/jsjkx.190400107
[10] 贾旭, 孙福明, 李豪杰, 曹玉东.
基于有监督双正则NMF的静脉识别算法
Vein Recognition Algorithm Based on Supervised NMF with Two Regularization Terms
计算机科学, 2018, 45(8): 283-287. https://doi.org/10.11896/j.issn.1002-137X.2018.08.051
[11] 于晓,聂秀山,马林元,尹义龙.
基于短空时变化的鲁棒视频哈希算法
Robust Video Hashing Algorithm Based on Short-term Spatial Variations
计算机科学, 2018, 45(2): 84-89. https://doi.org/10.11896/j.issn.1002-137X.2018.02.014
[12] 邹丽, 蔡希彪, 孙静, 孙福明.
基于双图正则的半监督NMF混合像元解混算法
Hyperspectral Unmixing Algorithm Based on Dual Graph-regularized Semi-supervised NMF
计算机科学, 2018, 45(12): 251-254. https://doi.org/10.11896/j.issn.1002-137X.2018.12.041
[13] 杨美姣,刘惊雷.
基于Nystrm采样和凸NMF的偏好聚类
Preference Clustering Based on Nystrm Sampling and Convex-NMF
计算机科学, 2018, 45(1): 55-61. https://doi.org/10.11896/j.issn.1002-137X.2018.01.008
[14] 李鹏,李英乐,王凯,何赞园,李星,常振超.
基于交互行为和连接分析的社交网络社团检测
Community Detection Based on User Interaction and Link Analysis in Social Networks
计算机科学, 2017, 44(7): 197-202. https://doi.org/10.11896/j.issn.1002-137X.2017.07.035
[15] 孙静,蔡希彪,姜小燕,孙福明.
基于图正则化和稀疏约束的增量型非负矩阵分解
Graph Regularized and Incremental Nonnegative Matrix Factorization with Sparseness Constraints
计算机科学, 2017, 44(6): 298-305. https://doi.org/10.11896/j.issn.1002-137X.2017.06.053
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!