计算机科学 ›› 2020, Vol. 47 ›› Issue (10): 97-101.doi: 10.11896/jsjkx.190700073

• 数据库&大数据&数据科学 • 上一篇    下一篇

融合内容相似度与多特征计算的个性化微博推荐模型

刘宇东, 孙豪, 蒋运承   

  1. 华南师范大学计算机学院 广州510631
  • 收稿日期:2019-07-09 修回日期:2019-09-03 出版日期:2020-10-15 发布日期:2020-10-16
  • 通讯作者: 刘宇东(cnlyd@163.com)
  • 基金资助:
    国家自然科学基金面上项目(61772210);广州市科技计划项目(201807010043

Personalized Microblog Recommendation Model Integrating Content Similarity and Multi-feature Computing

LIU Yu-dong, SUN Hao, JIANG Yun-cheng   

  1. School of Computer Science,South China Normal University,Guangzhou 510631,China
  • Received:2019-07-09 Revised:2019-09-03 Online:2020-10-15 Published:2020-10-16
  • About author:LIU Yu-dong,born in 1974,master,lecturer.His main research interests include natural language processing and machine learning.
  • Supported by:
    National Natural Science Foundation of China (61772210) and Guangzhou Science and Technology Project(201807010043)

摘要: 微博的流行导致信息过载等问题日益突出,如何帮助用户快速而准确地找到需要的微博已成为亟待解决的问题。基于协同过滤技术和基于LDA的微博推荐虽然能够达到一定的准确性,但并不能解决内容分类过于笼统及使用LDA模型处理短文本存在弊端的问题。为此,文中提出了一种融合内容相似度与多特征计算的个性化微博推荐模型。首先,从微博内容语义出发,基于word2vec技术计算得到用户与微博的内容相似度;然后,根据微博的时间、点赞数、评论数和转发数等特征,计算微博的保鲜度及受欢迎度;最后,综合考虑微博的内容相似度、保鲜度和受欢迎度,计算微博排序评分,从而实现用户的个性化微博推荐。该模型根据内容相似度进行推荐,从而避免了上述问题,也使得推荐结果在语义上更为精确。实验结果表明,所提推荐模型在准确率、召回率和F值上均具有良好的表现,尤其在准确率方面有明显的提升效果,约提升了10%,F值也提升了约5%,从而证明了该模型的有效性。

关键词: word2vec, 保鲜度, 受欢迎度, 微博, 相似度

Abstract: With the popularity of microblog,problems such as information overload are increasingly prominent.How to help users find the microblog they need quickly and accurately has become an urgent problem to be solved.Although microblog recommendation based on collaborative filtering technology and LDA can achieve certain accuracy,it can not solve the problems of genernal classification of content and the disadvantages when LDA model is used to deal with short texts.Therefore,this paper proposes a personalized microblog recommendation model integrating content similarity and multi-feature computing.Firstly,the content similarity between user and microblog is calculated based on word2vec.Then,according to the characteristics such as time,number of likes,comments and reposts,the freshness and popularity of microblog are calculated.Finally,the content similarity,freshness and popularity of microblog are comprehensively considered to calculate its ranking score,so as to realize users’ personalized microblog recommendation.This model considers recommendation from the perspective of content similarity,avoiding the above problems and making the recommendation results more accurate in semantics.Experimental results show that the proposed model has good performance in accuracy,recall rate and F-measure,in particular,the accuracy has been significantly improved by about 10%,and F-Measure is increased by about 5%,and the validity of the model is proved.

Key words: Freshness, Mircroblog, Popularity, Similarity, word2vec

中图分类号: 

  • TP391
[1]KARKADA U H.Friend recommender system for social networks[R].SI583 Term Paper,School of Information,University of Michigan,2009.
[2]HE Y,TAN J.Study on SINA micro-blog personalized recommendation based on semantic network [J].Expert Systems with Applications,2015,42(10):4797-4804.
[3]LIBEN-NOWELL D,KLEINBERG J.The link-prediction problem for social networks [J].Journal of the American Society for Information Science and Technology,2007,58(7):1019-1031.
[4]GUIMERA R,SALES-PARDO M.Missing and spurious interactions and the reconstruction of complex networks [J].Proceedings of the National Academy of Sciences,2009,106(52):22073-22078.
[5]GUO L,MA J,CHEN Z M,et al.Incorporating Item Relations for Social Recommendation[J].Chinese Journal of Computers,2014,37(1):219-228.
[6]AN Y,LI B,YANG R T,et al.Content-based Personalized Reco-mmendation on Popular Micro-topic [J].Journal of Intelligence,2014(2):155-160.
[7]ZHAO W X,JIANG J,WENG J,et al.Comparing twitter and traditional media using topic models [M]//Advances in Information Retrieval.Springer Berlin Heidelberg,2011:338-349.
[8]BEN-LHACHEMI N,NFAOUI E H.Using Tweets Embed-
dings For Hashtag Recommendation in Twitter[J].Procedia Computer Science,2018,127:7-15.
[9]CHANG P S,TING I H,WANG S L.Towards social recommendation system based on the data from microblogs [C]//2011 International Conference on ASONAM 2.11:Advances in Social Networks Analysis and Mining.IEEE,2011:672-677.
[10]WANG Y,GAO L.Social Circle-Based Algorithm for FriendRecommendation in Online Social Networks [J].Chinese Journal of Computers,2014,37(4):801-808.
[11]LI H,MA X P,SHI J,et al.Microblog Recommendation by Trust and Social Relationship [J].Journal of Chinese Information Processing,2017,31(2):146-153.
[12]WANG M J,HE Z M,ZHENG J.Microblog followee recommendation algorithm combining with trust and user relationship[J].Application Research of Computers,2018,35(12):46-49.
[13]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[C]//ICLR.2013.
[14]TANG M,ZHU L,ZOU X C.Document Vector Representation Based on Word2Vec[J].Computer Science,2016,43(6):214-217.
[15]DUAN X L,ZHANG Y S,SUN Y Z.Research on Sentence Vector Representation and Similarity Calculation Method About Microblog Texts [J].Computer Engineering,2017,43(5):143-148.
[16]TANG X B,LIANG M J.Research of Silent User Interest Mo-deling in Microblog Based on the Features of Structure and Content[J].Journal of the China Society for Scientific and Technical Information,2015(11):1214-1224.
[17]DING Y,XUE L.Time weight collaborative filtering[C]//Proceedings of the 14th ACM International Conference on Information and Knowledge Management.Bremen,Germany:ACM,2005.
[18]QI C,CHEN H C,YU H T.Method of evaluating micro-blog users’ influence based on comprehensive analysis of userbeha-vior[J].Application Research of Computers,2014,31(7):2004-2007.
[19]SHI L,TAO Y C,LI J Y,et al.Personalized and Real-time Reco-mmendation Model for Microblogs[J].Journal of Chinese Computer Systems,2016,37(9):1910-1914.
[20]ZHENG Z Y,JIA C Y,WANG Z F,et al.Computing Research of User Similarity Based on Micro-blog[J].Computer Science,2017(2):262-266.
[21]YU X S,SUN S.Research on Personalized Recommendation Model Based on Network Users’ Information Behavior [J].Journal of Chongqing University of Technology(Natural Scie-nce),2013,27(1):47-50.
[1] 柴慧敏, 张勇, 方敏.
基于特征相似度聚类的空中目标分群方法
Aerial Target Grouping Method Based on Feature Similarity Clustering
计算机科学, 2022, 49(9): 70-75. https://doi.org/10.11896/jsjkx.210800203
[2] 吴子仪, 李邵梅, 姜梦函, 张建朋.
基于自注意力模型的本体对齐方法
Ontology Alignment Method Based on Self-attention
计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[3] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[4] 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚.
融合双向门控循环单元和注意力机制的软件自承认技术债识别方法
Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism
计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[5] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[6] 王毅, 李政浩, 陈星.
基于用户场景的Android 应用服务推荐方法
Recommendation of Android Application Services via User Scenarios
计算机科学, 2022, 49(6A): 267-271. https://doi.org/10.11896/jsjkx.210700123
[7] 谢柏林, 黎琦, 邝建.
基于隐半马尔可夫模型的微博流行信息检测方法
Microblog Popular Information Detection Based on Hidden Semi-Markov Model
计算机科学, 2022, 49(6A): 291-296. https://doi.org/10.11896/jsjkx.210800011
[8] 黄少滨, 孙雪薇, 李熔盛.
基于跨句上下文信息的神经网络关系分类方法
Relation Classification Method Based on Cross-sentence Contextual Information for Neural Network
计算机科学, 2022, 49(6A): 119-124. https://doi.org/10.11896/jsjkx.210600150
[9] 徐建民, 孙朋, 吴树芳.
传播路径树核学习的微博谣言检测方法
Microblog Rumor Detection Method Based on Propagation Path Tree Kernel Learning
计算机科学, 2022, 49(6): 342-349. https://doi.org/10.11896/jsjkx.210400096
[10] 成科扬, 王宁, 崔宏纲, 詹永照.
基于局部注意力图互迁移的可解释性优化方法
Interpretability Optimization Method Based on Mutual Transfer of Local Attention Map
计算机科学, 2022, 49(5): 64-70. https://doi.org/10.11896/jsjkx.210400176
[11] 陈壮, 邹海涛, 郑尚, 于化龙, 高尚.
基于用户覆盖及评分差异的多样性推荐算法
Diversity Recommendation Algorithm Based on User Coverage and Rating Differences
计算机科学, 2022, 49(5): 159-164. https://doi.org/10.11896/jsjkx.210300263
[12] 李玉强, 张伟江, 黄瑜, 李琳, 刘爱华.
基于高斯分布的改进词嵌入主题情感模型
Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution
计算机科学, 2022, 49(2): 256-264. https://doi.org/10.11896/jsjkx.201200082
[13] 王胜, 张仰森, 陈若愚, 向尕.
基于细粒度差异特征的文本匹配方法
Text Matching Method Based on Fine-grained Difference Features
计算机科学, 2021, 48(8): 60-65. https://doi.org/10.11896/jsjkx.200700008
[14] 王春静, 刘丽, 谭艳艳, 张化祥.
基于模糊颜色特征和模糊相似度的图像检索方法
Image Retrieval Method Based on Fuzzy Color Features and Fuzzy Smiliarity
计算机科学, 2021, 48(8): 191-199. https://doi.org/10.11896/jsjkx.200800202
[15] 史伟, 付月.
考虑语境的微博短文本挖掘:情感分析的方法
Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis
计算机科学, 2021, 48(6A): 158-164. https://doi.org/10.11896/jsjkx.210200089
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!