计算机科学 ›› 2019, Vol. 46 ›› Issue (9): 79-84.doi: 10.11896/j.issn.1002-137X.2019.09.010

• 第35届中国数据库学术会议 • 上一篇    下一篇

基于维基百科类别图的推特用户兴趣挖掘

刘小捷1, 吕晓强1, 王晓玲1, 张伟1, 赵安2   

  1. (华东师范大学上海市高可信计算重点实验室 上海200062)1;
    (中国科学院电子学研究所苏州研究院 江苏 苏州215123)2
  • 收稿日期:2018-07-02 出版日期:2019-09-15 发布日期:2019-09-02
  • 通讯作者: 王晓玲(1975-),女,教授,博士生导师,CCF会员,主要研究领域为数据分析与数据管理,E-mail:xlwang@sei.ecnu.edu.cn
  • 作者简介:刘小捷(1994-),男,硕士,主要研究领域为社交媒体数据挖掘;吕晓强(1993-),男,硕士,主要研究领域为数据挖掘;张 伟(1988-),男,博士,副研究员,主要研究领域为数据挖掘与自然语言处理;赵 安(1992-),女,硕士,主要研究领域为自然语言处理与图像处理。
  • 基金资助:
    国家自然科学基金(61472141),国家重点研发计划(2017YFC0803700),上海市重点学科建设项目(B412),上海市可信物联网软件协同创新中心(ZF1213)

Mining User Interests on Twitter Using Wikipedia Category Graph

LIU Xiao-jie1, LV Xiao-qiang1, WANG Xiao-ling1, ZHANG Wei1, ZHAO An2   

  1. (Shanghai Key Laboratory of Trustworthy Computing,East China Normal University,Shanghai 200062,China)1;
    (Institute of Electronics,Chinese Academy of Sciences,Suzhou,Jiangsu 215123,China)2
  • Received:2018-07-02 Online:2019-09-15 Published:2019-09-02

摘要: 以Twitter为代表的社交网络在人们的生活中发挥着重要作用,其庞大的用户群体给社交网络数据挖掘带来了巨大的价值。社交网络用户兴趣建模方法被广泛研究,并被用于提供个性化推荐。文中提出了一种基于维基百科类别图的Twitter用户兴趣挖掘和表示方法。首先,该方法根据用户活跃度的差异,分别采用基于推文内容的方法和基于关注账号信息的方法来实现活跃用户与非活跃用户的兴趣挖掘。然后,在维基百科类别图上使用个性化PageRank算法进一步拓展用户兴趣,生成维基百科类别表示的用户兴趣画像。在推文推荐的应用背景下,对用户兴趣建模策略进行了实验分析和比较。实验结果表明,与现有的Twitter用户兴趣挖掘方法相比,所提方法显著提升了推文推荐效果,能够有效地改进用户兴趣挖掘效果。

关键词: 个性化PageRank, 社交网络, 推文推荐, 用户兴趣

Abstract: Social network such as Twitter plays an important role in life,and the huge number of users makes social network data mining valuable.User interest modeling on social networks has been studied widely,and is used to provide personalized recommendations.This paper proposed a novel user interest mining and representation approach based on Wikipedia Category Graph.User interest profile is represented as a wikipedia category vector.First,according to the degree of user’s activeness,an interest mining method based on tweets is proposed for active users,and another method based on names and descriptions of followees is proposed for passive users.Then,user interest is extended and genera-lized based on Wikipedia Category Graph by personalized PageRank algorithm,and user interest profile is represented by wikipedia categories.The proposed interest modeling strategy was evaluated in the context of a tweet recommendation system.The results shows that the proposed approach improves the quality of recommendation significantly compared with the state-of-the-art Twitter user interest modeling approachs,which means it can provide a more effective user interest profile.

Key words: Personalized PageRank, Social network, Tweets recommendation, User interest

中图分类号: 

  • G633.67
[1]ZHOU X,XU Y,LI Y,et al.The state-of-the-art in persona-lized recommender systems for social networking[J].Artificial Intelligence Review,2012,37(2):119-132.
[2]QIU Y F,WANG L Y,SHAO L S,et al.User in-terest mode-ling based on Weibo short text[J].Computer Engineering,2014,40(2):275-279.(in Chinese)邱云飞,王琳颍,邵良杉,等.基于微博短文本的用户兴趣建模方法[J].计算机工程,2014,40(2):275-279.
[3]WENG J,LIM E P,JIANG J,et al.TwitterRank:finding topic-sensitive influential twitterers[C]//Proceedings of the Third ACM International Conference on Web Search and Data Mi-ning.New York:ACM,2010:261-270.
[4]STEYVERS M,SMYTH P,ROSEN-ZVI M,et al.Probabilistic author-topic models for information discovery[C]//Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2004:306-315.
[5]ZHAO W X,JIANG J,WENG J,et al.Comparing twitter and traditional media using topic models[C]//European Conference on Information Retrieval.Berlin Heidelberg:Springer,2011:338-349.
[6]CHEN J,NAIRN R,NELSON L,et al.Short and tweet:experi-ments on recommending content from information streams[C]//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.New York:ACM,2010:1185-1194.
[7]HANNON J,BENNETT M,SMYTH B.Recommending twitterusers to follow using content and collaborative filtering approaches[C]//Proceedings of the Fourth ACM Conference on Re-commender Systems.New York:ACM,2010:199-206.
[8]LU C,LAM W,ZHANG Y.Twitter user modeling and tweets recommendation based on wikipedia concept graph[C]//Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence.2012.
[9]MICHELSON M,MACSKASSY S A.Discovering users’ topics of interest on twitter:a first look[C]//Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data.New York:ACM,2010:73-80.
[10]SIEHNDEL P,KAWASE R.TwikiMe!:user profiles that make sense[C]//Proceedings of the 2012th International Conference on Posters & Demonstrations Track-Volume 914.CEUR-WS.org,2012:61-64.
[11]KAPANIPATHI P,JAIN P,VENKATARAMANI C,et al.User interests identification on twitter using a hierarchical know-ledge base[C]//European Semantic Web Conference.Springer,Cham,2014:99-113.
[12]LIM K H,DATTA A.Interest classification of Twitter usersusing Wikipedia[C]//Proceedings of the 9th International Symposium on Open Collaboration.New York:ACM,2013:22.
[13]BESEL C,SCHLÖTTERER J,GRANITZER M.Inferring se-mantic interest profiles from Twitter followees:does Twitter know better than your friends?[C]//Proceedings of the 31st Annual ACM Symposium on Applied Computing.New York:ACM,2016:1152-1157.
[14]FARALLI S,STILO G,VELARDI P.Recommendation of mi-croblog users based on hierarchical interest profiles[J].Social Network Analysis and Mining,2015,5(1):25.
[15]PIAO G,BRESLIN J G.Inferring User Interests for PassiveUsers on Twitter by Leveraging Followee Biographies[C]//European Conference on Information Retrieval.Springer,Cham,2017:122-133.
[16]KENTER T,RIJKE M D.Short Text Similarity with Word Em-beddings[C]//ACM International on Conference on Information and Knowledge Management.New York:ACM,2015:1411-1420.
[17]GOLDBERG Y,LEVY O.word2vec Explained:deriving Miko-lov et al.’s negative-sampling word-embedding method[J].arXiv:1402.37232014.
[18]PIAO G,BRESLIN J G.Analyzing Aggregated Semantics-ena-bled User Modeling on Google+ and Twitter for Personalized Link Recommendations//Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization.New York:ACM, 2016:105-109.
[19]PIAO G,BRESLIN J G.Exploring Dynamics and Semantics of User Interests for User Modeling on Twitter for Link Recommendations[C]//International Conference on Semantic Systems.New York:ACM,2016:81-88.
[20]ZARRINKALAM F,KAHANI M,BAGHERI E.Mining user interests over active topics on social networks[J].Information Processing & Management,2018,54(2):339-357.
[21]FOGARAS D,RÁCZ B,CSALOGÁNY K,et al.Towards sca-ling fully personalized pagerank:Algorithms,lower bounds,and experiments[J].Internet Mathematics,2005,2(3):333-358.
[22]ABEL F,HAUFF C,HOUBEN G J,et al.Leveraging user mo-deling on the social web with linked data[C]//International Conference on Web Engineering.Springer-Verlag,2012:378-385.
[1] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[2] 朴勇, 朱锶源, 李阳.
融合用户和区位资源特征的混合房源推荐方法
Hybrid Housing Resource Recommendation Based on Combined User and Location Characteristics
计算机科学, 2022, 49(6A): 733-737. https://doi.org/10.11896/jsjkx.210800062
[3] 魏鹏, 马玉亮, 袁野, 吴安彪.
用户行为驱动的时序影响力最大化问题研究
Study on Temporal Influence Maximization Driven by User Behavior
计算机科学, 2022, 49(6): 119-126. https://doi.org/10.11896/jsjkx.210700145
[4] 余皑欣, 冯秀芳, 孙静宇.
结合物品相似性的社交信任推荐算法
Social Trust Recommendation Algorithm Combining Item Similarity
计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217
[5] 畅雅雯, 杨波, 高玥琳, 黄靖云.
基于SEIR的微信公众号信息传播建模与分析
Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR
计算机科学, 2022, 49(4): 56-66. https://doi.org/10.11896/jsjkx.210900169
[6] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[7] 郭磊, 马廷淮.
基于好友亲密度的用户匹配
Friend Closeness Based User Matching
计算机科学, 2022, 49(3): 113-120. https://doi.org/10.11896/jsjkx.210200137
[8] 陈晋鹏, 胡哈蕾, 张帆, 曹源, 孙鹏飞.
融合时间特性和用户偏好的卷积序列化推荐
Convolutional Sequential Recommendation with Temporal Feature and User Preference
计算机科学, 2022, 49(1): 115-120. https://doi.org/10.11896/jsjkx.201200192
[9] 王剑, 王玉翠, 黄梦杰.
社交网络中的虚假信息:定义、检测及控制
False Information in Social Networks:Definition,Detection and Control
计算机科学, 2021, 48(8): 263-277. https://doi.org/10.11896/jsjkx.210300053
[10] 谭琪, 张凤荔, 王婷, 王瑞锦, 周世杰.
融入结构度中心性的社交网络用户影响力评估算法
Social Network User Influence Evaluation Algorithm Integrating Structure Centrality
计算机科学, 2021, 48(7): 124-129. https://doi.org/10.11896/jsjkx.200600096
[11] 张人之, 朱焱.
基于主动学习的社交网络恶意用户检测方法
Malicious User Detection Method for Social Network Based on Active Learning
计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151
[12] 鲍志强, 陈卫东.
基于最大后验估计的谣言源定位器
Rumor Source Detection in Social Networks via Maximum-a-Posteriori Estimation
计算机科学, 2021, 48(4): 243-248. https://doi.org/10.11896/jsjkx.200400053
[13] 张少杰, 鹿旭东, 郭伟, 王世鹏, 何伟.
供需匹配中的非诚信行为预防
Prevention of Dishonest Behavior in Supply-Demand Matching
计算机科学, 2021, 48(4): 303-308. https://doi.org/10.11896/jsjkx.200900090
[14] 袁得嵛, 陈世聪, 高见, 王小娟.
基于斯塔克尔伯格博弈的在线社交网络扭曲信息干预算法
Intervention Algorithm for Distorted Information in Online Social Networks Based on Stackelberg Game
计算机科学, 2021, 48(3): 313-319. https://doi.org/10.11896/jsjkx.200400079
[15] 谭琪, 张凤荔, 张志扬, 陈学勤.
社交网络用户影响力的建模方法
Modeling Methods of Social Network User Influence
计算机科学, 2021, 48(2): 76-86. https://doi.org/10.11896/jsjkx.191200102
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!