计算机科学 ›› 2014, Vol. 41 ›› Issue (10): 91-94.doi: 10.11896/j.issn.1002-137X.2014.10.021

• 2013’和谐人机环境联合学术会议 • 上一篇    下一篇

基于低秩约束的稀疏主题模型

刘超,庄连生,俞能海   

  1. 中国科学技术大学信息学院 合肥230027;中国科学技术大学信息学院 合肥230027;中国科学技术大学信息学院 合肥230027
  • 出版日期:2018-11-14 发布日期:2018-11-14

STMLRC:Sparse Topic Model with Low Rank Constraint

LIU Chao,ZHUANG Lian-sheng and YU Neng-hai   

  • Online:2018-11-14 Published:2018-11-14

摘要: 传统潜在语义分析模型所得到的主题空间映射矩阵往往比较稠密,不仅存储代价比较高,而且各个主题含义不明确。针对该问题,提出一种新的稀疏主题模型,该模型通过对映射矩阵施加稀疏性约束,使得每个主题只与少数词项关联,来增加主题的可解释性;同时,通过对编码系数矩阵施加低秩约束,使得数据在主题空间中呈现出更好的聚类特性。实验结果表明,基于该模型得到的主题空间更有利于分类,映射矩阵的存储代价更低。

关键词: 主题模型,稀疏表示,低秩表示

Abstract: The project matrix learned by classic Latent Semantic Analysis is always dense,which leads to high storage cost and unclear semantic for each topic.To tackle this problem,a novel sparse topic model was proposed in this paper.By enforcing the sparsity of project matrix,the new model only selects a small number of relevant words for each topic and hence leads to a clear semantic interpretation.Moreover,by enforcing the low rankness of encoding matrix,data projected in the topic subspace shows a better clustering features.Experimental result show that topic subspace learned by our new topic model is in favor of classification,and significantly reduces the storage cost of project matrix.

Key words: Topic model,Sparse representation,Low rank representation

[1] Dumais S T.Latent Semantic Analysis[J].Annual Review of Information Science and Technology,2005,38(1):188-230
[2] Deerwester S,Dumais S T,Furnas G W,et al.Indexing by latent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407
[3] Chen X,Qi Y,Bai B, et al.Sparse Latent Semantic Analysis[C]∥SIAM 2011 International Conference on Data Mining.2011
[4] Liu G,Lin Z,Yu Y.Robust subspace segmentation by low-rank representation[C]∥Proceedings of the 26th International Conference on Machine Learning.Haifa,Israel.Citeseer,2010
[5] Liu Guang-can,Lin Zhou-chen,Yan Shui-cheng,et al.RobustRecovery of Subspace Structures by Low-Rank Representation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(1):2233-2246
[6] Zhuang Lian-sheng,Gao Hao-yuan,Lin Zhou-chen,et al.Non-Negative Low Rank and Sparse Graph for Semi-Supervised Learning[C]∥Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).June 2012
[7] Lin Z,Chen M,Wu L,et al.The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices[R].UIUC Technical Report UILU-ENG-09-2215.2009
[8] Candès E.Compressive sampling[C]∥Proceedings of the International Congressof Mathematicians.2006
[9] Candès E,Li X,Ma Y,et al.Robust principal component analysis[J].Journal of the ACM,2011,58(3)
[10] http://people.csail.mit.edu/jrennie/20Newsgroups/
[11] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/mu-lticlass.html#rcv1.multiclass
[12] Chang C,Lin C.LIBSVM:a library for supportvector machines.Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm,2001

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!