计算机科学 ›› 2015, Vol. 42 ›› Issue (4): 185-189.doi: 10.11896/j.issn.1002-137X.2015.04.037
吴海涛,应 时
WU Hai-tao and YING Shi
摘要: 随着社会的发展,信息已经成为社会发展越来越重要的部分,人类的信息传播活动越来越明显地展示出分众特征,对用户的分类成为人类信息活动的一个重要研究课题。从这一目标出发,分别基于信息内容、拓扑关系和两者综合的方法, 按兴趣主题对 社会媒体用户进行分类。对于基于信息内容的用户分类,采用LDA主题模型从用户所发布的内容中提取其主题分布,基于这一分布,采用支持向量机、决策树、贝叶斯等多种模型 按兴趣主题 对用户进行分类。对于基于拓扑关系的分类,依据相同兴趣主题的用户倾向于拥有共同的粉丝这一发现,构建分类模型来按兴趣主题对用户进行分类。然后提出综合信息内容和拓扑关系的分类方法来对用户进行分类。最后基于大规模Twitter数据的实验发现,采用综合方法对用户进行的兴趣分类性能明显高于采用单一信息内容或粉丝拓扑方法的性能。
[1] Choudhury M D,Diakopoulos N,Naaman M.Unfolding theevent landscape on twitter:classification and exploration of user categories[C]∥Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work.2012:241-244 [2] Perez-Sola C,Herrera-Joancomarti J.Classifying online socialnetwork users through the social graph[C]∥Proceedings of the 5th international conference on Foundations and Practice of Security.2012,115-131 [3] Chu Z,Gianvecchio S,Wang H,et al.Who is tweeting on Twitter:human,bot,or cyborg?[C]∥Proceedings of the 26th Annual Computer Security Applications Conference.2010:21-30 [4] Pennacchiotti M,Popescu A-M.Democrats,republicans andstarbucks afficionados:user classification in twitter[C]∥Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:430-438 [5] 葛红美,何炎祥,陈强,等.一种基于时间片的微博用户分类方法[J].小型微型计算机系统,2013(11):2441-2445 [6] An Exhaustive Study of Twitter Users Across the World-Beevolve,Social Media Analytics Platform[EB/OL].http://www.beevolve.com/twitter-statistics/ [7] Xu Z,Ru L,Xiang L,et al.Discovering User Interest on Twitter with a Modified Author-Topic Model[C]∥Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.Volume 01,2011:422-429 [8] Zhang C,Sun J.Large scale microblog mining using distributed MB-LDA[C]∥Proceedings of the 21st International Conference Companion on World Wide Web LSNA Workshop.2012:1035-1042 [9] Griffiths T L,Steyvers M.Finding scientific topics[J].Procee-dings of the National Academy of Sciences of the United States of America,2004,101(1):5228-5235 [10] Chang C-C,Lin C-J.LIBSVM:A library for support vector machines[J].ACM Trans.Intell.Syst.Technol.,2011,2(3):1-27 [11] Hall M,Frank E,Holmes G,et al.The WEKA data mining software: an update[J].SIGKDD Explor.Newsl.,2009,11(1):10-18 [12] Wu S,Hofman J M,Mason W A,et al.Who says what to whom on twitter[C]∥Proceedings of the international conference on World Wide Web (WWW).2011:705-714 [13] Diggle P.A kernel method for smoothing point process data[J].Applied Statistics,1985,34(2):138-147 |
No related articles found! |
|