计算机科学 ›› 2018, Vol. 45 ›› Issue (2): 121-124.doi: 10.11896/j.issn.1002-137X.2018.02.021

• 第六届全国智能信息处理学术会议 • 上一篇    下一篇

基于双通道LSTM模型的用户性别分类方法研究

王礼敏,严倩,李寿山,周国栋   

  1. 苏州大学计算机科学与技术学院 江苏 苏州215006,苏州大学计算机科学与技术学院 江苏 苏州215006,苏州大学计算机科学与技术学院 江苏 苏州215006,苏州大学计算机科学与技术学院 江苏 苏州215006
  • 出版日期:2018-02-15 发布日期:2018-11-13
  • 基金资助:
    本文受国家自然科学基金(61672366)资助

User Gender Classification with Dual-channel LSTM

WANG Li-min, YAN Qian, LI Shou-shan and ZHOU Guo-dong   

  • Online:2018-02-15 Published:2018-11-13

摘要: 微博用户性别分类旨在根据用户信息进行用户性别的识别。目前性别分类的相关研究主要针对单一类型的特征(文本特征或者社交特征)进行性别分类。与以往研究不同,文中提出了一种双通道LSTM(Long-Short Term Memory)模型,以充分结合文本特征(用户发表的微博文本)和社交特征(用户关注者的信息)进行用户性别分类方法的研究。首先,利用单通道LSTM模型分别学习两组文本特征,得到两种特征表示;然后,在神经网络中加入Merge层, 结合两种特征表示进行集成学习,以充分学习文本特征和社交特征之间的联系。实验结果表明,相对于传统的分类算法,双通道LSTM模型分类算法能够获得更好的用户性别分类效果。

关键词: 性别分类,新浪微博,双通道LSTM

Abstract: User gender classification aims at classifying the users into male and female with the provided information.Previous studies on gender classification mainly focus on a single type of features (i.e.,textual features or social features).Different from previous research,this paper proposed a new approach named dual-channel LSTM by making full use of the relationship between textual features (the text which user publishes) and social features (the followers which user concerns).Specifically,this paper first got two kinds of features using single-channel LSTM respectively.Then,it proposed a joint learning method to integrate the features.Lastly,it got the final classification results by the dual-channel LSTM.Empirical studies show that the dual-channel LSTM model achieves effective results for gender classification compared with traditional classification algorithms.

Key words: Gender classification,Sina weibo,Dual-channel LSTM

[1] WEN K M,XU S,LI R X,et al.Survey of Microblog and Chinese Microblog Information Processing[J].Journal of Chinese Information Processing,2012,6(6):28-36.(in Chinese) 文坤梅,徐帅,李瑞轩,等.微博及中文微博信息处理研究综述[J].中文信息学报,2012,6(6):28-36.
[2] ZHANG J F,XIA Y Q,YAO J M.A Review towards Microtext Processing[J].Journal of Chinese Information Processing,2012,6(4):21-27.(in Chinese) 张剑锋,夏云庆,姚建民.微博文本处理研究综述[J].中文信息学报,2012,6(4):21-27.
[3] WANG J J,LI S S,HUANG L.User Gender Classification in Chinese Microblog[J].Journal of Chinese Information Processing,2014,8(6):150-155.(in Chinese) 王晶晶,李寿山,黄磊.中文微博用户性别分类方法研究[J].中文信息学报,2014,8(6):150-155.
[4] DICKINSON M B,HU W.Gender Prediction on Twitter Using Stream Algorithms with N-Gram Character Features[J].Proceedings of International Journal of Intelligences Science,2012,2(4):143-148.
[5] MORGAN M S,DEREK R.Gender Inference of Twitter Users in Non-English Contexts[C]∥Proceedings of EMNLP.2013:1136-1145.
[6] GONCALVES C B,RATIKIEWICZ J,FLAMMINI A,et al.Predicting the political alignment of Twitter user[C]∥Procee-dings of the International Conference on Social Computing.2011.
[7] LIU,RUTHS D.What’s in a name? Using first names as features for gender inference in Twitter[C]∥Analyzing Microtext:2013 AAAI Spring Symposium.2013.
[8] EICHSTAEDT M C,KERN L,et al.Developing Age and Gender Predictive Lexica over Social Media[C]∥Proceedings of EMNLP.2014:1146-1151.
[9] FARNADI M G,VASUDEVAN G,DAVALOS S,et al.Ageand gender identification in social media[C]∥Proceedings of CLEF 2014 Evaluation Labs pages.2014:1129-1136.
[10] HOCHREITER,JURGEN S.Long Short-Term Memory[J].Neural Computation,1997,9(8):1735-1780.
[11] GRAVES A.Generating Sequences With Recurrent Neural Networks[J].arXiv preprint arXiv:1308.0850,2013.
[12] ANTOINE X B,YOSHUA B.Deep Sparse Rectifier NeuralNetworks[C]∥Proceedings of AISTATS.2011:315-323.
[13] HINTON G E,SRIVASTAVA N,KRIZHEVSKY A,et al.Improving Neural Networks by Preventing Co-adaptation of Feature Detectors[J].Computer Science,2012,3(4):212-223.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 雷丽晖,王静. 可能性测度下的LTL模型检测并行化研究[J]. 计算机科学, 2018, 45(4): 71 -75 .
[2] 孙启,金燕,何琨,徐凌轩. 用于求解混合车辆路径问题的混合进化算法[J]. 计算机科学, 2018, 45(4): 76 -82 .
[3] 张佳男,肖鸣宇. 带权混合支配问题的近似算法研究[J]. 计算机科学, 2018, 45(4): 83 -88 .
[4] 伍建辉,黄中祥,李武,吴健辉,彭鑫,张生. 城市道路建设时序决策的鲁棒优化[J]. 计算机科学, 2018, 45(4): 89 -93 .
[5] 史雯隽,武继刚,罗裕春. 针对移动云计算任务迁移的快速高效调度算法[J]. 计算机科学, 2018, 45(4): 94 -99 .
[6] 周燕萍,业巧林. 基于L1-范数距离的最小二乘对支持向量机[J]. 计算机科学, 2018, 45(4): 100 -105 .
[7] 刘博艺,唐湘滟,程杰仁. 基于多生长时期模板匹配的玉米螟识别方法[J]. 计算机科学, 2018, 45(4): 106 -111 .
[8] 耿海军,施新刚,王之梁,尹霞,尹少平. 基于有向无环图的互联网域内节能路由算法[J]. 计算机科学, 2018, 45(4): 112 -116 .
[9] 崔琼,李建华,王宏,南明莉. 基于节点修复的网络化指挥信息系统弹性分析模型[J]. 计算机科学, 2018, 45(4): 117 -121 .
[10] 王振朝,侯欢欢,连蕊. 抑制CMT中乱序程度的路径优化方案[J]. 计算机科学, 2018, 45(4): 122 -125 .