计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 185-189.doi: 10.11896/j.issn.1002-137X.2015.04.037

• 人工智能 • 上一篇    下一篇

基于信息内容和拓扑关系的社会媒体用户兴趣分类

吴海涛,应 时   

  1. 武汉大学计算机学院软件工程国家重点实验室 武汉430072;黄淮学院软件学院 驻马店463000,武汉大学计算机学院软件工程国家重点实验室 武汉430072
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受国家自然科学基金项目(61070012,61070022),国家自然科学基金重点项目(91118003,3,61272108)资助

Classifying Interests of Social Media Users Based on Information Content and Social Graph

WU Hai-tao and YING Shi   

  • Online:2018-11-14 Published:2018-11-14

摘要: 随着社会的发展,信息已经成为社会发展越来越重要的部分,人类的信息传播活动越来越明显地展示出分众特征,对用户的分类成为人类信息活动的一个重要研究课题。从这一目标出发,分别基于信息内容、拓扑关系和两者综合的方法, 按兴趣主题对 社会媒体用户进行分类。对于基于信息内容的用户分类,采用LDA主题模型从用户所发布的内容中提取其主题分布,基于这一分布,采用支持向量机、决策树、贝叶斯等多种模型 按兴趣主题 对用户进行分类。对于基于拓扑关系的分类,依据相同兴趣主题的用户倾向于拥有共同的粉丝这一发现,构建分类模型来按兴趣主题对用户进行分类。然后提出综合信息内容和拓扑关系的分类方法来对用户进行分类。最后基于大规模Twitter数据的实验发现,采用综合方法对用户进行的兴趣分类性能明显高于采用单一信息内容或粉丝拓扑方法的性能。

关键词: 在线社会网络,兴趣分类,LDA,粉丝拓扑

Abstract: With the development of society,there has been a more and more obvious presence of the characteristic of audience-segmentation in human activity over information spreading,and user classification has also become an important research topic.So the article carried out a study over online social network user from multiple perspectives which mainly include user classification based on interested topics and preference,classify interests of social media user based on information content and topological relation,and both them respectively.For user classification based on information content,we adopted LDA to extract the topic distribution from the content posted by users.And the distribution is used in support vector machine,decision tree,Bayes and other multiple models to classify interests of users.For user classification based on topological relation,we found that users with same interests tend to have more common fans,and based on this finding we built classification models to classify users.Then,we proposed methods of combining information content and topological relation to classify users.Based on the experiments using Twitter data,we found that the combined method outperforms the one based on information content or topological relation.

Key words: Online social networks,User classification,LDA,Topological relation

[1] Choudhury M D,Diakopoulos N,Naaman M.Unfolding theevent landscape on twitter:classification and exploration of user categories[C]∥Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work.2012:241-244
[2] Perez-Sola C,Herrera-Joancomarti J.Classifying online socialnetwork users through the social graph[C]∥Proceedings of the 5th international conference on Foundations and Practice of Security.2012,115-131
[3] Chu Z,Gianvecchio S,Wang H,et al.Who is tweeting on Twitter:human,bot,or cyborg?[C]∥Proceedings of the 26th Annual Computer Security Applications Conference.2010:21-30
[4] Pennacchiotti M,Popescu A-M.Democrats,republicans andstarbucks afficionados:user classification in twitter[C]∥Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:430-438
[5] 葛红美,何炎祥,陈强,等.一种基于时间片的微博用户分类方法[J].小型微型计算机系统,2013(11):2441-2445
[6] An Exhaustive Study of Twitter Users Across the World-Beevolve,Social Media Analytics Platform[EB/OL].http://www.beevolve.com/twitter-statistics/
[7] Xu Z,Ru L,Xiang L,et al.Discovering User Interest on Twitter with a Modified Author-Topic Model[C]∥Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.Volume 01,2011:422-429
[8] Zhang C,Sun J.Large scale microblog mining using distributed MB-LDA[C]∥Proceedings of the 21st International Conference Companion on World Wide Web LSNA Workshop.2012:1035-1042
[9] Griffiths T L,Steyvers M.Finding scientific topics[J].Procee-dings of the National Academy of Sciences of the United States of America,2004,101(1):5228-5235
[10] Chang C-C,Lin C-J.LIBSVM:A library for support vector machines[J].ACM Trans.Intell.Syst.Technol.,2011,2(3):1-27
[11] Hall M,Frank E,Holmes G,et al.The WEKA data mining software: an update[J].SIGKDD Explor.Newsl.,2009,11(1):10-18
[12] Wu S,Hofman J M,Mason W A,et al.Who says what to whom on twitter[C]∥Proceedings of the international conference on World Wide Web (WWW).2011:705-714
[13] Diggle P.A kernel method for smoothing point process data[J].Applied Statistics,1985,34(2):138-147

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!