Computer Science ›› 2016, Vol. 43 ›› Issue (12): 120-124.doi: 10.11896/j.issn.1002-137X.2016.12.021

Previous Articles     Next Articles

Online LDA on Dynamic Vocabulary

ZHANG Jian-wei, YAN Jian-feng, LIU Xiao-sheng and YANG Lu   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Most of the online LDA algorithms are based on the fixed vocabulary table currently.The vocabulary table may not often match the processed corpus in practice which has a bad effect on the precision of LDA.To solve this problem,we let the topic words distribution subject to the dirichlet process (DP) and re-deduce the model under the framework of BP algorithm.So that we can make the vocabulary table empty before the algorithm running and it can continually add new words to table.Results from the experiments show that,our new algorithm can make the vocabulary table match the corpus better and the dynamic vocabulary table makes the new algorithm achieve better performance on perplexity and PMI compared with other state-of-the-art fixed vocabulary online algorithms.

Key words: Latent dirichlet allocation,Dynamic vocabulary,Dirichlet process,Streaming process

[1] Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].The Journal of Machine Learning Research,2003,3(1):993-1022
[2] Zeng J,Cheung W K,Liu J.Learning topic models by beliefpropagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(5):1121-1134
[3] Heinrich G.Parameter estimation for text analysis[R].Technical Report,2005
[4] Sethuraman J.A constructive definition of Dirichlet priors[R].Florida State Univ Tallahassee Dept of Statistics,1991
[5] Zhai K,Boyd-Graber J.Online Latent Dirichlet Allocation with Infinite Vocabulary[C]∥Proceedings of The 30th International Conference on Machine Learning.2013:561-569
[6] Mimno D,Hoffman M,Blei D.Sparse stochastic inference for latent Dirichlet allocation[J].arXiv,2012(3):362-365
[7] Newman S K D,Cavedon L.External evaluation of topic models[C]∥Australasian Document Computing Symposium.2012:11-18
[8] Hoffman A F M,Blei D.Online inference of topics with latent dirichlet allocation[C]∥NIPS.2010:856-864
[9] Yao L,Mimno D,McCallum A.Efficient methods for topic mo-del inference on streaming document collections[C]∥Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2009:937-946
[10] Zeng J,Liu Z Q,Cao X Q.Online belief propagation for topicmodeling[J].arXiv preprint arXiv:1210.2179,2012
[11] Ishwaran H,Zarepour M.Dirichlet prior sieves in finite normal mixtures[J].Statistica Sinica,2002,12(3):941-963
[12] Mei S Y,Wang F,Zhou S G.Dirichlet process mixture model,extensions and appication[J].Chin Sci Bull,2012,7(34):3243-3257(in Chinese) 梅素玉,王飞,周水庚.狄利克雷过程混合模型、扩展模型及应用[J].科学通报,2012,57(34):3243-3257
[13] Gong Sheng-rong,Ye Yun,Liu Chun-ping,et al.Topic Tracking Based on Online Belief Propagation[J].Chinese Journal of Computers,2015,8(2):249-260(in Chinese) 龚声蓉,叶芸,刘纯平,等.基于在线消息传递的主题追踪方法[J].计算机学报,2015,8(2):249-260

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!