计算机科学 ›› 2016, Vol. 43 ›› Issue (12): 120-124.doi: 10.11896/j.issn.1002-137X.2016.12.021
张健伟,严建峰,刘晓升,杨璐
ZHANG Jian-wei, YAN Jian-feng, LIU Xiao-sheng and YANG Lu
摘要: 目前的在线潜在狄利克雷分布模型(LDA)算法大多是基于固定的词汇表,在实际应用中经常会出现词汇表和处理的语料不匹配的情况,影响了模型的实用性。针对这个现象,在置信传播算法(BP)的框架下,使主题单词分布服从狄利克雷过程,重新推导公式,使得词汇表在模型运行之前为空,并且在处理时不断向词汇表中增加发现的新词。实验证明,这种新的基于动态词汇表的算法不仅使得词汇表与语料的贴合度更高,而且使其在混淆度以及互信息指数这两个指标上能够比基于固定词汇表的LDA模型表现得更加优越。
[1] Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].The Journal of Machine Learning Research,2003,3(1):993-1022 [2] Zeng J,Cheung W K,Liu J.Learning topic models by beliefpropagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(5):1121-1134 [3] Heinrich G.Parameter estimation for text analysis[R].Technical Report,2005 [4] Sethuraman J.A constructive definition of Dirichlet priors[R].Florida State Univ Tallahassee Dept of Statistics,1991 [5] Zhai K,Boyd-Graber J.Online Latent Dirichlet Allocation with Infinite Vocabulary[C]∥Proceedings of The 30th International Conference on Machine Learning.2013:561-569 [6] Mimno D,Hoffman M,Blei D.Sparse stochastic inference for latent Dirichlet allocation[J].arXiv,2012(3):362-365 [7] Newman S K D,Cavedon L.External evaluation of topic models[C]∥Australasian Document Computing Symposium.2012:11-18 [8] Hoffman A F M,Blei D.Online inference of topics with latent dirichlet allocation[C]∥NIPS.2010:856-864 [9] Yao L,Mimno D,McCallum A.Efficient methods for topic mo-del inference on streaming document collections[C]∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2009:937-946 [10] Zeng J,Liu Z Q,Cao X Q.Online belief propagation for topicmodeling[J].arXiv preprint arXiv:1210.2179,2012 [11] Ishwaran H,Zarepour M.Dirichlet prior sieves in finite normal mixtures[J].Statistica Sinica,2002,12(3):941-963 [12] Mei S Y,Wang F,Zhou S G.Dirichlet process mixture model,extensions and appication[J].Chin Sci Bull,2012,7(34):3243-3257(in Chinese) 梅素玉,王飞,周水庚.狄利克雷过程混合模型、扩展模型及应用[J].科学通报,2012,57(34):3243-3257 [13] Gong Sheng-rong,Ye Yun,Liu Chun-ping,et al.Topic Tracking Based on Online Belief Propagation[J].Chinese Journal of Computers,2015,8(2):249-260(in Chinese) 龚声蓉,叶芸,刘纯平,等.基于在线消息传递的主题追踪方法[J].计算机学报,2015,8(2):249-260 |
No related articles found! |
|