计算机科学 ›› 2016, Vol. 43 ›› Issue (3): 57-61, 79.doi: 10.11896/j.issn.1002-137X.2016.03.011

• 第十五届中国机器学习会议 • 上一篇    下一篇

嵌入LDA主题模型的协同过滤推荐算法

高娜,杨明   

  1. 南京师范大学计算机科学与技术学院 南京210046,南京师范大学计算机科学与技术学院 南京210046
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金(61272222),国家自然科学基金重点项目(61432008)资助

Topic Model Embedded in Collaborative Filtering Recommendation Algorithm

GAO Na and YANG Ming   

  • Online:2018-12-01 Published:2018-12-01

摘要: 协同过滤推荐算法由于其推荐的准确性和高效性已经成为推荐领域最流行的推荐算法之一。该算法通过分析用户的历史评分记录来构建用户兴趣模型,进而为用户产生一组推荐。然而,推荐系统中用户的评分记录是极为有限的,导致传统协同过滤算法面临严重的数据稀疏性问题。针对此问题,提出了一种改进的嵌入LDA主题模型的协同过滤推荐算法(ULR-CF算法)。该算法利用LDA主题建模方法在用户项目标签集上挖掘潜在的主题信息,进而结合文档-主题概率分布矩阵和评分矩阵来共同度量用户和项目相似度。实验结果表明,提出的ULR-CF算法可以有效缓解数据稀疏性问题,并能显著提高推荐系统的准确性。

关键词: 协同过滤,稀疏性,主题模型

Abstract: Collaborative filtering(CF) recommendation algorithm has become one of the most popular algorithms in the field of recommendation due to its accuracy and efficiency.CF algorithm constructs interest models of users through analyzing their history rating records.Then it generates a set of recommendations for users.While the rating records of users in the recommendation system are limited,it results in the traditional CF algorithm facing with serious problem of data sparsity.Therefore,to address the problem of sparsity,we proposed an improved collaborative filtering recommendation algorithm that embeds the LDA topic model,named LDA-CF.This algorithm utilizes LDA topic model method to discover latent topics information in tags of users and items.Then it unifies both the document-topic probability distribution matrix and rating matrix simultaneously to measure the similarities between users and items.The experiment results indicate that the developed ULR-CF algorithm can alleviate the sparsity problem,and improve the accuracy of recommendation system simultaneously.

Key words: Collaborative filtering,Sparsity,Topic model

[1] Rich E.User modeling via stereotypes[J].Cognitive Science,1979,3(4):329-354
[2] Nakamura A,Abe N.Collaborative Filtering Using WeightedMajority Prediction Algorithms[C]∥Proceedings of the 15th International Conference on Machine Learning,1998.San Francisco:Morgan Kaufmann,1998:395-403
[3] Linden G,Smith B,York J.Amazon.com recommendations:Item-to-item collaborative filtering[J].Internet Computing,2003,7(1):76-80
[4] Ji H,Li J,Ren C,et al.Hybrid collaborative filtering model for improved recommendation[C]∥Service Operations and Logistics,and Informatics,2013.Dongguan:IEEE,2013:142-145
[5] Koren Y,Bell R,Volinsky C.Matrix factorization techniques for recommender systems[J].Computer,2009,42(8):30-37
[6] Sarwar B,Karypis G,Konstan J,et al.Item-based collaborative filtering recommendation algorithms[C]∥Proceedings of 10th International Conference on World Wide Web,2001.New York:ACM,2001:285-295
[7] Hu Y,Koren Y,Volinsky C.Collaborative filtering for implicitfeedback datasets[C]∥Proceedings of the 8th International Conference on Data Mining,2008.Pisa:IEEE,2008:263-272
[8] Koren Y.Factorization meets the neighborhood:a multifaceted collaborative filtering model[C]∥Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2008.New York:ACM,2008:426-434
[9] Salton G,Wong A,Yang C S.A Vector Space Model for Automatic Indexing[J].Communications of the ACM,1975,18(10):613-620
[10] Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3:601-608
[11] Riedl J,Konstan J.Movielens dataset[EB/OL].(1998-10-19)[2008-07].http://www.grouplens.org/data

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75, 88 .
[2] 夏庆勋,庄毅. 一种基于局部性原理的远程验证机制[J]. 计算机科学, 2018, 45(4): 148 -151, 162 .
[3] 厉柏伸,李领治,孙涌,朱艳琴. 基于伪梯度提升决策树的内网防御算法[J]. 计算机科学, 2018, 45(4): 157 -162 .
[4] 王欢,张云峰,张艳. 一种基于CFDs规则的修复序列快速判定方法[J]. 计算机科学, 2018, 45(3): 311 -316 .
[5] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[6] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[7] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[8] 刘琴. 计算机取证过程中基于约束的数据质量问题研究[J]. 计算机科学, 2018, 45(4): 169 -172 .
[9] 钟菲,杨斌. 基于主成分分析网络的车牌检测方法[J]. 计算机科学, 2018, 45(3): 268 -273 .
[10] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99, 116 .