计算机科学 ›› 2017, Vol. 44 ›› Issue (Z11): 160-165.doi: 10.11896/j.issn.1002-137X.2017.11A.033

• 模式识别与图像处理 • 上一篇    下一篇

基于社会化表示的用户性别识别

朱裴松,钱铁云,吴闽泉   

  1. 武汉大学软件工程国家重点实验室 武汉430072,武汉大学软件工程国家重点实验室 武汉430072,武汉大学软件工程国家重点实验室 武汉430072
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金项目:社会媒体中的垃圾用户集团识别方法研究(61572376)资助

Identifying Users’ Gender via Social Representations

ZHU Pei-song, QIAN Tie-yun and WU Min-quan   

  • Online:2018-12-01 Published:2018-12-01

摘要: 由于具有针对性的广告投放和个性化搜索等潜在应用,性别预测引起了巨大的研究兴趣。现有的大多数研究依赖于文本内容,而文本信息有时较难获取,从而使得文本特征很难被提取。对此,提出了一个新框架,该框架仅使用用户ID来对性别进行预测。该框架的关键在于在嵌入式连接空间中表示用户。提出两种策略来修改词嵌入技术,使其应用到用户嵌入当中。这两种策略分别是:1)序列化用户ID以获得社会关系的顺序;2)将用户嵌入大的上下文滑动窗口。在两个真实的新浪微博数据集上进行了广泛的实验,实验结果表明该方法显著优于目前最好的图形嵌入基线方法,其准确率也高于基于内容的方法。

关键词: 性别预测,社交媒体用户,社交关系,社会化表示

Abstract: Gender prediction has evoked great research interests due to its potential applications,like targeted advertisement and personalized search.Most of existing studies rely on the content texts.However,the text information is hard to access.This makes it difficult to extract text features.In this paper,we proposed a novel frame-work which only involves the users’ ID for gender prediction.The key idea is to represent users in the embedding connection space.We presented two strategies to modify the word embedding technique for user embedding.The first is to sequentialize users’ ID to get the order of social context.The second is to embed users into a large-sized sliding window of contexts.We conducted extensive experiments on two real data sets from Sina Weibo.Results show that our method is significantly better than the state-of-the-art graph embedding baselines.Its accuracy also outperforms that of the content based approaches.

Key words: Gender prediction,Users in social media,Social contexts,Social representations

[1] AHMED A,SHERVASHIDZE N,Narayanamurthy S,et al.Distributed large-scale natural graph factorization[C]∥International Conference on World Wide Web.ACM,2013:37-48.
[2] ALOWIBDI J S,BUY U A,YU P.Empirical Evalu-ation of Profile Characteristics for Gender Classification on Twitter[C]∥International Conference on Machine Learning and Applications.2013:365-369.
[3] BAMMAN D,EISENTEIN J,SCHNOEBELEN T.Gen-deridentity and lexical variation in social media[J].Journal of Sociolinguistics,2014,8(2):135-160.
[4] BENGIO Y,SCHWENK H,SENCAL J,et al.Neural Probabilistic Language Models[J].Journal of Machine Learning Research,2001,3(6):1137-1155.
[5] BERGSMA S,DURME B V.Using Conceptual Class Attributes to Characterize Social Media Users[C]∥Meeting of the Associa-tion for Computational Linguistics.2013:710-720.
[6] BURGER J D,HENDERSON J,KIM G,et al.Discrimi-nating gender on Twitter[C]∥Conference on Empirical Methods in Natural Language Processing(EMNLP 2011) .2011:1301-1309.
[7] CHENG N,Chen HEN,Chandramouli R,et al.Gender identification from E-mails[C]∥IEEE Symposium on Computational Intelligence and Data Mining,2009(CIDM ’09).IEEE,2009:154-158.
[8] COLIZZA V,FLAMMINI A,SERRANO rrano M A,et al.Detecting rich-club ordering in complex networks[J].Nature Physi-cs,2006,2(3):110-115.
[9] CULOTTA J C A,KUMAR N R.Predicting the demographics of Twitter users from website traffic data[C]∥Proc.29th Conf.on AI.2015:72-78.
[10] FILIPPOVA K.User demographics and language in an implicit social network[C]∥Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.2012:1478-1488.
[11] GOSWAMI S,SARKAR S,RUSTAGI M.Stylometric Analysis of Bloggers’ Age and Gende[C]∥International Conference on Weblogs and Social Media(ICWSM 2009).San Jose,California,Usa,DBLP,2009.
[12] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distri-butedRepresentations of Words and Phrases and their Compositionality[J].Advances in Neural Information Processing Systems,2013,6:3111-3119.
[13] MNIH A,HINTON G.Three new graphical models for statistical language modelling[C]∥Machine Learning,Proceedings of the Twenty-Fourth International Conference.DBLP,2007:641-648.
[14] MUKHERJEE A,LIU B.Improving gender classi-fication ofblog authors[C]∥Conference on Empirical Methods in Natural Language Processing(EMNLP 2010).2010:207-217.
[15] OTTERBACHER J.Inferring gender of movie re-viewers:ex-ploiting writing style,content and meta-data[C]∥ACM Confe-rence on Information and Knowledge Management(CIKM 2010).Toronto,Ontario,Canada,DBLP,2010:369-378.
[16] PEERSMAN C,DAELEMANS W,V AERENBERGH L V.Predicting age and gender in online social net-works[C]∥International CIKM Workshop on Search and Mining User-Generated Contents(Smuc 2011).Glasgow,United Kingdom,DBLP,2011:37-44.
[17] PENNACCHIOTTI M,POPESCU A M.A Machine LearningApproach to Twitter User Classification[C]∥International Conference on Weblogs and Social Media.Barcelona,Catalonia,Spain,DBLP,2011.
[18] PEROZZI B,AL-RFOU R,SKIENA S.DeepWalk:online learning of social representations[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2014:701-710.
[19] RAO D,YAROWSKY D,SHREEVATS A,et al.Classi-fying latent user attributes in twitter[C]∥Interna-tional Workshop on Search and Mining User-Generated Contents.ACM,2010:37-44.
[20] SCHLER J,KOPPEL M,ARGAMON S,et al.Effects of Age and Gender on Blogging[J].Frontiers of Information Technology & Electronic Engineering,2006,4(s1/2):199-205.
[21] TANG C,ROSS K,SAXENA N,et al.What’s in a name:a studyof names,gender inference,and gender behavior in facebook[M]∥Database Systems for Adanced Applications.SpringerBerlin Heidelberg,2011:344-356.
[22] TANG J,QU M,WANG M,et al.LINE:Large-scale Information Network Embedding[C]∥International Conference on World Wide Web.ACM,2015:1067-1077.
[23] XIAO C,ZHOU F,WU Y.Predicting audience gender in online content-sharing social networks[J].Journal of the Association for Information Science and Technology,2013,4(6):1284-1297.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!