计算机科学 ›› 2021, Vol. 48 ›› Issue (6): 96-102.doi: 10.11896/jsjkx.200700195

• 数据库&大数据&数据科学 • 上一篇    下一篇

面向数据表示的Cauchy非负矩阵分解

段菲1,2, 王慧敏1, 张超1,2   

  1. 1 山西大学计算机与信息技术学院 太原030006
    2 山西大学计算智能与中文信息处理教育部重点实验室 太原030006
  • 收稿日期:2020-07-30 修回日期:2020-10-05 出版日期:2021-06-15 发布日期:2021-06-03
  • 通讯作者: 段菲(fduan@163.com)
  • 基金资助:
    国家自然科学基金(61806116,61976128,61972238,61802237);山西省重点研发计划项目(国际科技合作,201903D421041);山西省自然科学基金面上项目(201801D221175,201901D111035,201901D211176);山西省高等学校优秀成果培育项目(2019SK036);山西省高等学校青年科研人员培育计划;山西省高等学校科技创新项目(2019L0066);2020年小店区-山西大学产学研合作项目;山西省1331工程项目

Cauchy Non-negative Matrix Factorization for Data Representation

DUAN Fei1,2, WANG Hui-min1, ZHANG Chao1,2   

  1. 1 School of Computer and Information Technology,Shanxi University,Taiyuan 030006,China
    2 Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University),Ministry of Education,Taiyuan 030006,China
  • Received:2020-07-30 Revised:2020-10-05 Online:2021-06-15 Published:2021-06-03
  • About author:DUAN Fei,born in 1979,Ph.D,lectu-rer,is a member of China Computer Fe-deration.His main research interests include computer vision and machine learning.
  • Supported by:
    National Natural Science Foundation of China(61806116,61976128,61972238,61802237),Key R&D program of Shanxi Province(International Cooperation,201903D421041),Natural Science Foundation of Shanxi Province(201801D221175,201901D111035,201901D211176),Cultivate Scientific Research Excellence Programs of Higher Education Institutions in Shanxi(CSREP)(2019SK036),Training Program for Young Scientific Researchers of Higher Education Institutions in Shanxi,Scientific and Technological Innovation Programs of Higher Education Institutions in Shanxi(STIP)(2019L0066),Industry-University-Research Collaboration Program Between Shanxi University and Xiaodian District and the 1331 Engineering Project of Shanxi Province,China.

摘要: 非负矩阵分解(Non-negative Matrix Factorization,NMF)是一类广泛应用于数据挖掘和机器学习领域的重要矩阵分解模型,可从一组高维非负向量中提取出低维、稀疏和有意义的特征。标准NMF利用Frobenius范数的平方度量重建误差,虽然在一些应用场景中表现出一定的有效性,但对非高斯噪声和离群点较为敏感。由于现实世界中的真实数据不可避免地包含各种噪声,因此有必要对非高斯噪声和离群点较为稳健的非负矩阵分解模型进行研究。为此,文中提出用Cauchy估计函数取代标准NMF中的平方形式的残差。在度量样本重建误差时,充分考虑样本特征不同维度之间的相关性,以样本的重建误差作为基本的重建误差度量单元。此外,基于半二次规划推导了高效的乘性更新规则,用于求解所提出的模型。在3个真实人脸图像库上的聚类实验中验证了所提模型和算法的有效性。实验结果表明,所提算法对人脸姿态、光照和表情变化均表现出一定的稳健性,且聚类结果对参数的依赖性较小。

关键词: Cauchy损失函数, 非负矩阵分解, 聚类, 数据表示

Abstract: As an important matrix factorization model,non-negative matrix factorization(NMF) is widely used in the fields of data mining and machine learning.It is often used to extract low-dimensional,sparse,and meaningful features from a collection of non-negative data vectors.Although effective in some scenarios,standard NMF utilizes squared Frobenius norm to quantify the reconstruction residual,which make it very sensitive to non-gaussian noises and outliers.As many real data inevitably contain various kind of noises,it is desirable to use robust version of NMF which is insensitive to non-gaussian noise and outliers.This paper proposes to use the Cauchy function to measure the quality of approximation instead of squared Euclidean distance for each sample and take the dependencies between different feature dimensions into account.Based on the theory of half-quadratic programming,this paper derives multiplicative updating rules to solve the proposed model effectively.To verify the effectiveness of the proposed approach,extensive unsupervised clustering experiments are conducted on several benchmark face image datasets.The experimental results show that the proposed model is robust to head poses variations,lighting,and emotion changes.Further,our model achieves consistently good performance when the parameter c varies in a large range on all three benchmark datasets.

Key words: Cauchy loss, Clustering, Data representation, Nonnegative matrix factorization

中图分类号: 

  • TP391
[1]LEE D,SEUNG H S.Algorithms for Non-negative Matrix Factorization [C]//Advances in Neural Information Processing System.2001:556-562.
[2]PAATERO P,TAPPER U.Positive matrix factorization:a non-negative factor model with optimal utilization of error estimates of data values[J].Environmetrics,1994,5(2):111-126.
[3]LEE D,SEUGN H S.Learning the parts of objects by non-negative matrix factorization[J].Nature,1999,401:788-791.
[4]LI S,HOU X,ZHANG H.et al.Learning spatially localized,parts-based representation[C]//International Conference on Computer Vision and Pattern Recognition.2001:207-212.
[5]WANG H,HUANG H,DING C,et al.Predicting protein-protein interactions from multimodal biological data sources via non-negative matrix factorization[C]//Proceedings of International Conference on Research in Computational Molecular Bio-logy.2012:774-783.
[6]COOPER M,FOOTE J.Summarizing video using non-negativesimilarity matrix factorization[C]//Proceedings of IEEE Workshop on Multimedia Signal Processing.2002:25-28.
[7]WANG H,HUANG H,DING C.Cross-language web page classification via joint non-negative matrix tri-factorization based dyadic knowledge transfer[C]//Proceedings of Annual ACM SIGIR Conference.2011:933-942.
[8]XU W,LIU X,GONG Y.Document clustering based on non-negative matrix factorization[C]//Proceedings of the 26th ACM SIGIR.2003:267-273.
[9]LI J,ZHOU G,QIU Y,et al.Deep graph regularized non-negative matrix factorization for multi-view clustering[J].Neurocomputing,2020,390:108-116.
[10]LIANG N,YANG Z,LI Z,et al.Semi-supervised multi-viewclustering with graph-regularized partially shared non-negative matrix factorization[J].Knowledge-Based Systems,2020,190:1-10.
[11]CHEN G,CHEN X,WANG J,et al.Robust non-negative matrix factorization for link prediction in complex networks using manifold regularization and sparse learning[J].Physica A:Statistical Mechanics and its Applications,2020,539:1-18.
[12]YE W,WANG H,YAN S,et al.Nonnegative matrix factorization for clustering ensemble based on dark knowledge[J].Knowledge-Based Systems,2019,163:624-631.
[13]KONG D,DING C,HUANG H.Robust nonnegative matrix factorization using l21-norm[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Mana-gement.2015:871-880.
[14]LAM E Y.Non-negative matrix factorization for images withLaplacian noise[C]//Proceedings IEEE Asia Pacific Conference on Circuits and Systems.2008:798-801.
[15]GUAN N,TAO D,LUO Z,et al.MahNMF:Manhattan non-negative matrix factorization [EB/OL].(2012-07-14)[2012-07-14].https://arxiv.org/abs/1207.3438.
[16]NESTEROV Y.Smooth minimization of non-smooth functions[J].Mathematical Programming,2005,103(1):127-152.
[17]ZHANG L,CHEN Z,ZHENG M,et al.Robust non-negativematrix factorization[J].Frontiers of Electrical and Electronic Engineering,2011,6(2):192-200.
[18]DU L,LI X,SHEN Y.Robust nonnegative matrix factorization via half-quadratic minimization[C]//Proceedings of the 12th International Conference on Data Mining.2012:201-210.
[19]GUAN N,LIU T,ZHANG Y,et al.Truncated Cauchy non-negative matrix factorization[J].IEEE Transactions on PatternAnalysis and Machine Intelligence,2019,41(1):246-259.
[20]NAGY F.Parameter estimation of the Cauchy distribution in information theory approach[J].Journal of Universal Computer Science,2006,12(9):1332-1344.
[21]NIKOLOVA M,NG M.Analysis of half-quadratic minimization methods for signal and image recovery[J].SIAM Journal onScientific Computing,2006,27(3):937-966.
[22]NIKOLOVA M,CHAN R.The equivalence of half-quadraticminimization and the gradient linearization iteration[J].IEEE Transactions on Image Processing,2007,16(6):1623-1627.
[23]NIE F,DING C,LUO D,et al.Improved minmax cut graph clustering with nonnegative relaxation[C]//Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases.2010:451-466.
[24]NIE F,XU D,TSANG I,et al.Spectral embedded clustering[C]//Proceedings of International Joint Conferences on Artificial Intelligence.2009:1181-1186.
[1] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于分层抽样优化的面向异构客户端的联邦学习
Federated Learning Based on Stratified Sampling Optimization for Heterogeneous Clients
计算机科学, 2022, 49(9): 183-193. https://doi.org/10.11896/jsjkx.220500263
[2] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[3] 鲁晨阳, 邓苏, 马武彬, 吴亚辉, 周浩浩.
基于DBSCAN聚类的集群联邦学习方法
Clustered Federated Learning Methods Based on DBSCAN Clustering
计算机科学, 2022, 49(6A): 232-237. https://doi.org/10.11896/jsjkx.211100059
[4] 郁舒昊, 周辉, 叶春杨, 王太正.
SDFA:基于多特征融合的船舶轨迹聚类方法研究
SDFA:Study on Ship Trajectory Clustering Method Based on Multi-feature Fusion
计算机科学, 2022, 49(6A): 256-260. https://doi.org/10.11896/jsjkx.211100253
[5] 毛森林, 夏镇, 耿新宇, 陈剑辉, 蒋宏霞.
基于密度敏感距离和模糊划分的改进FCM算法
FCM Algorithm Based on Density Sensitive Distance and Fuzzy Partition
计算机科学, 2022, 49(6A): 285-290. https://doi.org/10.11896/jsjkx.210700042
[6] 陈景年.
一种适于多分类问题的支持向量机加速方法
Acceleration of SVM for Multi-class Classification
计算机科学, 2022, 49(6A): 297-300. https://doi.org/10.11896/jsjkx.210400149
[7] 刘丽, 李仁发.
医疗CPS协作网络控制策略优化
Control Strategy Optimization of Medical CPS Cooperative Network
计算机科学, 2022, 49(6A): 39-43. https://doi.org/10.11896/jsjkx.210300230
[8] 陈佳舟, 赵熠波, 徐阳辉, 马骥, 金灵枫, 秦绪佳.
三维城市场景中的小物体检测
Small Object Detection in 3D Urban Scenes
计算机科学, 2022, 49(6): 238-244. https://doi.org/10.11896/jsjkx.210400174
[9] 邢云冰, 龙广玉, 胡春雨, 忽丽莎.
基于SVM的类别增量人体活动识别方法
Human Activity Recognition Method Based on Class Increment SVM
计算机科学, 2022, 49(5): 78-83. https://doi.org/10.11896/jsjkx.210400024
[10] 朱哲清, 耿海军, 钱宇华.
面向化学结构的线段聚类算法
Line-Segment Clustering Algorithm for Chemical Structure
计算机科学, 2022, 49(5): 113-119. https://doi.org/10.11896/jsjkx.210700131
[11] 张宇姣, 黄锐, 张福泉, 隋栋, 张虎.
基于菌群优化的近邻传播聚类算法研究
Study on Affinity Propagation Clustering Algorithm Based on Bacterial Flora Optimization
计算机科学, 2022, 49(5): 165-169. https://doi.org/10.11896/jsjkx.210800218
[12] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[13] 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞.
基于节点相似性和网络嵌入的复杂网络社区发现算法
Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding
计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009
[14] 韩洁, 陈俊芬, 李艳, 湛泽聪.
基于自注意力的自监督深度聚类算法
Self-supervised Deep Clustering Algorithm Based on Self-attention
计算机科学, 2022, 49(3): 134-143. https://doi.org/10.11896/jsjkx.210100001
[15] 蒲实, 赵卫东.
一种面向动态科研网络的社区检测算法
Community Detection Algorithm for Dynamic Academic Network
计算机科学, 2022, 49(1): 89-94. https://doi.org/10.11896/jsjkx.210100023
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!