朴素并行LDA

doi:10.11896/j.issn.1002-137X.2015.06.051

计算机科学 ›› 2015, Vol. 42 ›› Issue (6): 243-246.doi: 10.11896/j.issn.1002-137X.2015.06.051

朴素并行LDA

高阳,严建峰,刘晓升

苏州大学计算机科学与技术学院苏州215006,苏州大学计算机科学与技术学院苏州215006,苏州大学计算机科学与技术学院苏州215006

出版日期:2018-11-14 发布日期:2018-11-14
基金资助:
本文受国家自然科学基金(61003154,61373092,61033013,61272449,61202029),江苏省教育厅重大项目(12KJA520004),苏州大学创新团队(SDT2012B02),广东省重点实验室开放课题(SZU-GDPHPCL-2012-09)资助

Nave Parallel LDA

GAO Yang, YAN Jian-feng and LIU Xiao-sheng

Online:2018-11-14 Published:2018-11-14

摘要/Abstract

摘要： 并行潜在狄利克雷分配(LDA)主题模型在计算与通信两方面的时间消耗较大,导致训练模型的时间过长,因而无法被广泛应用。提出朴素并行LDA算法,针对计算和通信分别提出改进方法。一方面通过加入单词影响因子以及设置阈值的方法来降低文本训练的粒度,另一方面通过降低通信频率来减少通信时间。实验结果表明,优化后的并行LDA在保证精度损失为1%的前提下,将训练速度提高了36%,有效提高了并行的加速比。

关键词: 潜在狄利克雷分配,并行,加速优化

Abstract: The parallel latent Dirichlet allocation (LDA) costs a lot of time in computation and communication,which brings about long time to train a LDA model and then it can’t be widely applied.This paper proposed nave parallel LDA algorithm,presenting two methods to solve this problem.One is to add impact factor of each word and set thresholdto reduce the amount of corpus,the other is to reduce the communication frequency to decrease the communication time.Experimental results show that the optimized distributed LDA can accelerate the total training time by 36% and improve the speedup ratio,while the loss of accuracy is below 1%.

Key words: Latent Dirichlet allocation,Parallel,Speedup optimization

高阳,严建峰,刘晓升. 朴素并行LDA[J]. 计算机科学, 2015, 42(6): 243-246. https://doi.org/10.11896/j.issn.1002-137X.2015.06.051

GAO Yang, YAN Jian-feng and LIU Xiao-sheng. Nave Parallel LDA[J]. Computer Science, 2015, 42(6): 243-246. https://doi.org/10.11896/j.issn.1002-137X.2015.06.051

参考文献

[1] Deerwester S C,Dumais S T,Landauer T K,et al.Indexing bylatent semantic analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407
[2] Hofmann T.Probabilistic latent semantic indexing[C]∥Special Inspector General for Iraq Reconstruction.1999:50-57
[3] Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[C]∥Neural Information Processing Systems.2001:601-608
[4] Griffiths T L,Steyvers M.Finding scientific topics[J].Procee-dings of the National Academy of Sciences,2004,101(1):5228-5235
[5] Zeng J,Cheung W K,Liu J.Learning topic models by belief propagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,5(5):1121-1134
[6] 都志辉,等.高性能计算并行编程技术——MPI并行程序设计[M].北京:清华大学出版社,2001 Du Zhi-hui,et al.High performance computing parallel programming technology——MPI parallel program design[M].Peking:Tsinghua University Press,2001
[7] Newman D,Asuncion A U,Smyth P,et al.Distributed inference for latent dirichlet allocation[C]∥Neural Information Proces-sing Systems.2007
[8] Asuncion A U,Smyth P,Welling M.Asynchronous distributed learning of topic models[C]∥Neural Information Processing Systems.2008:81-88
[9] Wang Y,Bai H,Stanton M,et al.Plda:Parallel latent dirichlet allocation for large-scale applications[C]∥AAIM.2009:301-314
[10] Liu Z,Zhang Y,Chang E Y,et al.Plda+:Parallel latentdirichlet allocation with data placement and pipeline processing[J].ACM TIST,2011,2(3):1-18
[11] Zhai K,Boyd-Graber J L,Asadi N,et al.lda:a flexible large scale topic modeling package using variational inference in mapreduce[C]∥ WWW.2012:879-888
[12] Yan F,Xu N,Qi Y.Parallel inference for latent dirichlet allocation on graphics processing units[C]∥Neural Information Processing Systems.2009:2134-2142

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

朴素并行LDA

Nave Parallel LDA

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

Metrics

本文评价

推荐阅读 0