LDA语义理解研究

Abstract

Abstract: Latent Dirichlet allocation(LDA) is a popular model used in text cluster,and is proved to improve the performance of information retrieval by explaining queries and documents effectively.There are mainly two algorithms to solve the inference of LDA model:Gibbs sampling and belief propagation.This paper compared the effect of these two inference algorithms on information retrieval in different topic scales,and used two different ways to explain queries and documents.One way is representing them with document-topic distribution,the other is representing them with word refactoring.Experimental results show that document-topic distribution and Gibbs sampling inference algorithm can improve the performance of information retrieval.

Key words: Latent Dirichlet allocation,Information retrieval,Approximate inference,Textual interpretation

GAO Yang, YANG Lu, LIU Xiao-sheng and YAN Jian-feng. Study of Semantic Understanding by LDA[J].Computer Science, 2015, 42(8): 279-282.

References

[1] Liu X Y,Croft W B.Cluster-based retrieval using language mo-dels[C]∥Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrievel.2004:186-193
[2] Wei X,Croft W B.Lda-based document models for ad-hoc re-trieval[C]∥Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrievel.2006:178-185
[3] Blei D M,Ng A,Jordan M.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022
[4] Griffiths T L,Steyvers M.Finding scientific topics[J].Procee-dings of the National Academy of Sciences of USA,2004,101(1):5228-5235
[5] Zeng Jia,Cheung W K,Liu Ji-ming.Learning topic models by belief Propagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,33(5):1121-1134
[6] Asuncion A U,Welling M,Smyth P,et al.On smoothing and inference for topic models[C]∥Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence.2009:27-34
[7] Yao L,Mimno D M,McCallum A.Efficient methods for topic model inference on streaming document collections[C]∥ Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:937-946
[8] Porteous I,Newman D,Ihler A T,et al.Fast collapsed gibbs sampling for latent dirichlet allocation[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Knowledge Discovery and Data Mi-ning.2008:569-577
[9] Manning C D,Raghavan P,Schütze H.Introduction to information retrieval[M].England:Cambridge University Press,2008
[10] 李峰,李芳.中文词语语义相似度计算——基于《知网》 2000[J].中文信息学报,2007,21(3):99-105 Li Feng,Li Fang.An New Approach Measuring Semantic Similarity in Hownet 2000[J].Journal of Chinese Information Processing,2007,1(3):99-105
[11] 江敏,肖诗斌,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报,2008,22(5):84-90 Jiang Min,Xiao Shi-bin,et al.An Improved Word Similarity Computing Method Based on Hownet[J].Journal of Chinese Information Processing,2008,2(5):84-90

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Study of Semantic Understanding by LDA

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0