Computer Science ›› 2015, Vol. 42 ›› Issue (8): 279-282.

Previous Articles     Next Articles

Study of Semantic Understanding by LDA

GAO Yang, YANG Lu, LIU Xiao-sheng and YAN Jian-feng   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Latent Dirichlet allocation(LDA) is a popular model used in text cluster,and is proved to improve the performance of information retrieval by explaining queries and documents effectively.There are mainly two algorithms to solve the inference of LDA model:Gibbs sampling and belief propagation.This paper compared the effect of these two inference algorithms on information retrieval in different topic scales,and used two different ways to explain queries and documents.One way is representing them with document-topic distribution,the other is representing them with word refactoring.Experimental results show that document-topic distribution and Gibbs sampling inference algorithm can improve the performance of information retrieval.

Key words: Latent Dirichlet allocation,Information retrieval,Approximate inference,Textual interpretation

[1] Liu X Y,Croft W B.Cluster-based retrieval using language mo-dels[C]∥Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrievel.2004:186-193
[2] Wei X,Croft W B.Lda-based document models for ad-hoc re-trieval[C]∥Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrievel.2006:178-185
[3] Blei D M,Ng A,Jordan M.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022
[4] Griffiths T L,Steyvers M.Finding scientific topics[J].Procee-dings of the National Academy of Sciences of USA,2004,101(1):5228-5235
[5] Zeng Jia,Cheung W K,Liu Ji-ming.Learning topic models by belief Propagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,33(5):1121-1134
[6] Asuncion A U,Welling M,Smyth P,et al.On smoothing and inference for topic models[C]∥Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence.2009:27-34
[7] Yao L,Mimno D M,McCallum A.Efficient methods for topic model inference on streaming document collections[C]∥ Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:937-946
[8] Porteous I,Newman D,Ihler A T,et al.Fast collapsed gibbs sampling for latent dirichlet allocation[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Knowledge Discovery and Data Mi-ning.2008:569-577
[9] Manning C D,Raghavan P,Schütze H.Introduction to information retrieval[M].England:Cambridge University Press,2008
[10] 李峰,李芳.中文词语语义相似度计算——基于《知网》 2000[J].中文信息学报,2007,21(3):99-105 Li Feng,Li Fang.An New Approach Measuring Semantic Similarity in Hownet 2000[J].Journal of Chinese Information Processing,2007,1(3):99-105
[11] 江敏,肖诗斌,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报,2008,22(5):84-90 Jiang Min,Xiao Shi-bin,et al.An Improved Word Similarity Computing Method Based on Hownet[J].Journal of Chinese Information Processing,2008,2(5):84-90

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!