Computer Science ›› 2015, Vol. 42 ›› Issue (6): 243-246.doi: 10.11896/j.issn.1002-137X.2015.06.051

Previous Articles     Next Articles

Nave Parallel LDA

GAO Yang, YAN Jian-feng and LIU Xiao-sheng   

  • Online:2018-11-14 Published:2018-11-14

Abstract: The parallel latent Dirichlet allocation (LDA) costs a lot of time in computation and communication,which brings about long time to train a LDA model and then it can’t be widely applied.This paper proposed nave parallel LDA algorithm,presenting two methods to solve this problem.One is to add impact factor of each word and set thresholdto reduce the amount of corpus,the other is to reduce the communication frequency to decrease the communication time.Experimental results show that the optimized distributed LDA can accelerate the total training time by 36% and improve the speedup ratio,while the loss of accuracy is below 1%.

Key words: Latent Dirichlet allocation,Parallel,Speedup optimization

[1] Deerwester S C,Dumais S T,Landauer T K,et al.Indexing bylatent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407
[2] Hofmann T.Probabilistic latent semantic indexing[C]∥Special Inspector General for Iraq Reconstruction.1999:50-57
[3] Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[C]∥Neural Information Processing Systems.2001:601-608
[4] Griffiths T L,Steyvers M.Finding scientific topics[J].Procee-dings of the National Academy of Sciences,2004,101(1):5228-5235
[5] Zeng J,Cheung W K,Liu J.Learning topic models by belief propagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,5(5):1121-1134
[6] 都志辉,等.高性能计算并行编程技术——MPI并行程序设计[M].北京:清华大学出版社,2001 Du Zhi-hui,et al.High performance computing parallel programming technology——MPI parallel program design[M].Peking:Tsinghua University Press,2001
[7] Newman D,Asuncion A U,Smyth P,et al.Distributed inference for latent dirichlet allocation[C]∥Neural Information Proces-sing Systems.2007
[8] Asuncion A U,Smyth P,Welling M.Asynchronous distributed learning of topic models[C]∥Neural Information Processing Systems.2008:81-88
[9] Wang Y,Bai H,Stanton M,et al.Plda:Parallel latent dirichlet allocation for large-scale applications[C]∥AAIM.2009:301-314
[10] Liu Z,Zhang Y,Chang E Y,et al.Plda+:Parallel latentdirichlet allocation with data placement and pipeline processing[J].ACM TIST,2011,2(3):1-18
[11] Zhai K,Boyd-Graber J L,Asadi N,et al.lda:a flexible large scale topic modeling package using variational inference in mapreduce[C]∥ WWW.2012:879-888
[12] Yan F,Xu N,Qi Y.Parallel inference for latent dirichlet allocation on graphics processing units[C]∥Neural Information Processing Systems.2009:2134-2142

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!