Computer Science ›› 2016, Vol. 43 ›› Issue (12): 101-107.doi: 10.11896/j.issn.1002-137X.2016.12.018

Previous Articles     Next Articles

Sliding-window Based Topic Modeling

CHANG Dong-ya, YAN Jian-feng, YANG Lu and LIU Xiao-sheng   

  • Online:2018-12-01 Published:2018-12-01

Abstract: LDA(Latent Dirichlet Allocation) is an important hierarchical Bayesian model for probabilistic topic mode-ling,which touches on many important applications of text mining.This model takes neither the order of documents nor the order of words in one document into account,which simplifies the complexity of issues and provides a great chance to improve itself.To achieve this goal,a sliding-window based topic model was proposed.The fundamental idea of this model is that the theme of one word in a specific document has a strong relationship at the words near by and is mainly affected by them.Through modifying the size of window and sliding step,document is cut into smaller pieces.Meanwhile,aiming at the big dataset and data flow,online sliding window theme model was proposed.Experiments show that the sliding-window based topic model has better generalization performance and accuracy on four common datasets.

Key words: Latentdirichlet allocation,Topic model,Sliding window

[1] Blei D M,Ng A Y,Jordan M I.Latent Dirichlet allocation[J].J.Mach.Learn.Res.,2003(3):993-1022
[2] Blei D M.Introduction to Probabilistic Topic Models[J].Communications of the ACM,2011,27(6):55-65
[3] Griffiths T L,Steyvers M.Finding scientific topics[J].Procee-dings of the National Academy of Sciences,2004,101(Suppl 1):5228-5235
[4] Zeng J,Cheung W K,Liu J.Learning Topic Models by BeliefPropagation[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2013,35(5):1121-1134
[5] Wu X,Zeng J,et al.Finding Better Topics:Features,Priors and Constraints[M]∥Advances in Knowledge Discovery and Data Mining.Springer International Publishing,2014:296-310
[6] Zeng J.A topic modeling toolbox using belief propagation[J].The Journal of Machine Learning Research,2012,13(1):2233-2236
[7] Rosen-Zvi M,Griffiths T,Steyvers M,et al.Theauthor-topicmodel for authors and documents[C]∥UAI.2004:487-494
[8] Chang J,Blei D M.Hierarchical Relational models for Document Networks[J].EprintArxiv,2009,4(1):124-150
[9] Takita M,Naziruddin B,Matsumoto S,et al.Expectation-Propogation for the Generative Aspect Model[J].Computer Science,2002,5(11):3257-3269
[10] Schlkopf B,Platt J,Hofmann T.A Collapsed Variational Ba-yesian Inference Algorithm for Latent Dirichlet Allocation[J].Advances in Neural Information Processing Systems,2006(19):1353-1360
[11] Asuncion A,Welling M,Smyth P,et al.On smoothing and infe-rence for topic models[C]∥Proceedings of the Twenty- Fifth Conference on Uncertainty in Artificial Intelligence.AUAI Press,2009:27-34
[12] Yao L,Mimno D,McCallum A.Efficient methods for topic mo-del inference on streaming document collections[C]∥Procee-dings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2009:937-946
[13] Hoffman M,Bach F R,Blei D M.Online learning for latentdirichlet allocation[C]∥Advances in Neural Information Processing Systems.2010:856-864
[14] Zeng J,Liu Z Q,Cao X Q.Fast Online EM for Big Topic Modding[J].IEEE Transactions on Knowledge & Data Enginee-ring,2016,8(3):675-688
[15] Ye Y,Gong S,Liu C,et al.Online belief propagation algorithm for probabilistic latent semantic analysis[J].Frontiers of Computer Science,2013,7(4):526-535
[16] Asuncion A,Welling M,Smyth P,et al.On smoothing and infe-rence for topic models[C]∥Proceedings of the Twenty- Fifth Conference on Uncertainty in Artificial Intelligence.AUAI Press,2009:27-34
[17] Braun M,McAuliffe J.Variational inference for large-scale mo-dels of discrete choice[J].Journal of the American Statistical Association,2010,105(489):324-335
[18] Wallach H M,Mimno D M,Mccallum A.Rethinking LDA:why priors matter[J].Advances in Neural Information Processing Systems,2009(23):1973-1981
[19] Gao Yang,Yang Lu,Liu Xiao-sheng,et al.Study of Semantic Understanding by LDA[J].Computer Science,2015,2(8):279-282(in Chinese) 高阳,杨璐,刘晓升,等.LDA语义理解研究[J].计算机科学,2015,42(8):279-282

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!