Computer Science ›› 2017, Vol. 44 ›› Issue (Z11): 160-165.doi: 10.11896/j.issn.1002-137X.2017.11A.033

Previous Articles     Next Articles

Identifying Users’ Gender via Social Representations

ZHU Pei-song, QIAN Tie-yun and WU Min-quan   

  • Online:2018-12-01 Published:2018-12-01

Abstract: Gender prediction has evoked great research interests due to its potential applications,like targeted advertisement and personalized search.Most of existing studies rely on the content texts.However,the text information is hard to access.This makes it difficult to extract text features.In this paper,we proposed a novel frame-work which only involves the users’ ID for gender prediction.The key idea is to represent users in the embedding connection space.We presented two strategies to modify the word embedding technique for user embedding.The first is to sequentialize users’ ID to get the order of social context.The second is to embed users into a large-sized sliding window of contexts.We conducted extensive experiments on two real data sets from Sina Weibo.Results show that our method is significantly better than the state-of-the-art graph embedding baselines.Its accuracy also outperforms that of the content based approaches.

Key words: Gender prediction,Users in social media,Social contexts,Social representations

[1] AHMED A,SHERVASHIDZE N,Narayanamurthy S,et al.Distributed large-scale natural graph factorization[C]∥International Conference on World Wide Web.ACM,2013:37-48.
[2] ALOWIBDI J S,BUY U A,YU P.Empirical Evalu-ation of Profile Characteristics for Gender Classification on Twitter[C]∥International Conference on Machine Learning and Applications.2013:365-369.
[3] BAMMAN D,EISENTEIN J,SCHNOEBELEN T.Gen-deridentity and lexical variation in social media[J].Journal of Sociolinguistics,2014,8(2):135-160.
[4] BENGIO Y,SCHWENK H,SENCAL J,et al.Neural Probabilistic Language Models[J].Journal of Machine Learning Research,2001,3(6):1137-1155.
[5] BERGSMA S,DURME B V.Using Conceptual Class Attributes to Characterize Social Media Users[C]∥Meeting of the Associa-tion for Computational Linguistics.2013:710-720.
[6] BURGER J D,HENDERSON J,KIM G,et al.Discrimi-nating gender on Twitter[C]∥Conference on Empirical Methods in Natural Language Processing(EMNLP 2011) .2011:1301-1309.
[7] CHENG N,Chen HEN,Chandramouli R,et al.Gender identification from E-mails[C]∥IEEE Symposium on Computational Intelligence and Data Mining,2009(CIDM ’09).IEEE,2009:154-158.
[8] COLIZZA V,FLAMMINI A,SERRANO rrano M A,et al.Detecting rich-club ordering in complex networks[J].Nature Physi-cs,2006,2(3):110-115.
[9] CULOTTA J C A,KUMAR N R.Predicting the demographics of Twitter users from website traffic data[C]∥Proc.29th Conf.on AI.2015:72-78.
[10] FILIPPOVA K.User demographics and language in an implicit social network[C]∥Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning.2012:1478-1488.
[11] GOSWAMI S,SARKAR S,RUSTAGI M.Stylometric Analysis of Bloggers’ Age and Gende[C]∥International Conference on Weblogs and Social Media(ICWSM 2009).San Jose,California,Usa,DBLP,2009.
[12] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distri-butedRepresentations of Words and Phrases and their Compositionality[J].Advances in Neural Information Processing Systems,2013,6:3111-3119.
[13] MNIH A,HINTON G.Three new graphical models for statistical language modelling[C]∥Machine Learning,Proceedings of the Twenty-Fourth International Conference.DBLP,2007:641-648.
[14] MUKHERJEE A,LIU B.Improving gender classi-fication ofblog authors[C]∥Conference on Empirical Methods in Natural Language Processing(EMNLP 2010).2010:207-217.
[15] OTTERBACHER J.Inferring gender of movie re-viewers:ex-ploiting writing style,content and meta-data[C]∥ACM Confe-rence on Information and Knowledge Management(CIKM 2010).Toronto,Ontario,Canada,DBLP,2010:369-378.
[16] PEERSMAN C,DAELEMANS W,V AERENBERGH L V.Predicting age and gender in online social net-works[C]∥International CIKM Workshop on Search and Mining User-Generated Contents(Smuc 2011).Glasgow,United Kingdom,DBLP,2011:37-44.
[17] PENNACCHIOTTI M,POPESCU A M.A Machine LearningApproach to Twitter User Classification[C]∥International Conference on Weblogs and Social Media.Barcelona,Catalonia,Spain,DBLP,2011.
[18] PEROZZI B,AL-RFOU R,SKIENA S.DeepWalk:online learning of social representations[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2014:701-710.
[19] RAO D,YAROWSKY D,SHREEVATS A,et al.Classi-fying latent user attributes in twitter[C]∥Interna-tional Workshop on Search and Mining User-Generated Contents.ACM,2010:37-44.
[20] SCHLER J,KOPPEL M,ARGAMON S,et al.Effects of Age and Gender on Blogging[J].Frontiers of Information Technology & Electronic Engineering,2006,4(s1/2):199-205.
[21] TANG C,ROSS K,SAXENA N,et al.What’s in a name:a studyof names,gender inference,and gender behavior in facebook[M]∥Database Systems for Adanced Applications.SpringerBerlin Heidelberg,2011:344-356.
[22] TANG J,QU M,WANG M,et al.LINE:Large-scale Information Network Embedding[C]∥International Conference on World Wide Web.ACM,2015:1067-1077.
[23] XIAO C,ZHOU F,WU Y.Predicting audience gender in online content-sharing social networks[J].Journal of the Association for Information Science and Technology,2013,4(6):1284-1297.

No related articles found!
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75, 88 .
[2] XIA Qing-xun and ZHUANG Yi. Remote Attestation Mechanism Based on Locality Principle[J]. Computer Science, 2018, 45(4): 148 -151, 162 .
[3] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .
[4] WANG Huan, ZHANG Yun-feng and ZHANG Yan. Rapid Decision Method for Repairing Sequence Based on CFDs[J]. Computer Science, 2018, 45(3): 311 -316 .
[5] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[6] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[7] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[8] LIU Qin. Study on Data Quality Based on Constraint in Computer Forensics[J]. Computer Science, 2018, 45(4): 169 -172 .
[9] ZHONG Fei and YANG Bin. License Plate Detection Based on Principal Component Analysis Network[J]. Computer Science, 2018, 45(3): 268 -273 .
[10] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99, 116 .