摘要: 潜在狄利克雷分配(LDA)被广泛应用于文本的聚类。有效理解信息检索的查询和文本,被证明能提高信息检索的性能。其中吉布斯采样和置信传播是求解LDA模型的两种热门的近似推理算法。比较了两种近似推理算法在不同主题规模下对信息检索性能的影响,并比较了LDA对文本解释的两种不同方式,即用文档的主题分布来替换原查询和文本,以及用文档的单词重构来替换原查询和文本。实验结果表明,文档的主题解释以及吉布斯采样算法能够有效提高信息检索的性能。
[1] Liu X Y,Croft W B.Cluster-based retrieval using language mo-dels[C]∥Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrievel.2004:186-193 [2] Wei X,Croft W B.Lda-based document models for ad-hoc re-trieval[C]∥Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrievel.2006:178-185 [3] Blei D M,Ng A,Jordan M.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003,3(1):993-1022 [4] Griffiths T L,Steyvers M.Finding scientific topics[J].Procee-dings of the National Academy of Sciences of USA,2004,101(1):5228-5235 [5] Zeng Jia,Cheung W K,Liu Ji-ming.Learning topic models by belief Propagation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,33(5):1121-1134 [6] Asuncion A U,Welling M,Smyth P,et al.On smoothing and inference for topic models[C]∥Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence.2009:27-34 [7] Yao L,Mimno D M,McCallum A.Efficient methods for topic model inference on streaming document collections[C]∥ Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2009:937-946 [8] Porteous I,Newman D,Ihler A T,et al.Fast collapsed gibbs sampling for latent dirichlet allocation[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Knowledge Discovery and Data Mi-ning.2008:569-577 [9] Manning C D,Raghavan P,Schütze H.Introduction to information retrieval[M].England:Cambridge University Press,2008 [10] 李峰,李芳.中文词语语义相似度计算——基于《知网》 2000[J].中文信息学报,2007,21(3):99-105 Li Feng,Li Fang.An New Approach Measuring Semantic Similarity in Hownet 2000[J].Journal of Chinese Information Processing,2007,1(3):99-105 [11] 江敏,肖诗斌,等.一种改进的基于《知网》的词语语义相似度计算[J].中文信息学报,2008,22(5):84-90 Jiang Min,Xiao Shi-bin,et al.An Improved Word Similarity Computing Method Based on Hownet[J].Journal of Chinese Information Processing,2008,2(5):84-90 |
No related articles found! |
|