计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 219-225.doi: 10.11896/jsjkx.201100128

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于多粒度文本特征表示的微博用户兴趣识别

郁友琴, 李弼程   

  1. 华侨大学计算机科学与技术学院 福建 厦门361021
  • 收稿日期:2020-11-17 修回日期:2021-02-18 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 李弼程(lbclm@163.com)
  • 作者简介:418846636@qq.com
  • 基金资助:
    国家社会科学基金资助项目(19BXW110)

Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation

YU You-qin, LI Bi-cheng   

  1. College of Computer Science and Technology,Huaqiao University,Xiamen,Fujian 361021,China
  • Received:2020-11-17 Revised:2021-02-18 Online:2021-12-15 Published:2021-11-26
  • About author:YU You-qin,born in 1993,postgra-duate.Her main research interests include user portrait and personalized information recommendation.
    LI Bi-cheng,born in 1970,Ph.D,professor,Ph.D supervisor.His main research interests include text analysis and understanding,information fusion.
  • Supported by:
    National Social Science Foundation of China(19BXW110).

摘要: 微博用户兴趣发现对社交网络的个性化推荐和信息传播的正确引导具有重要意义,因此提出了一种基于多粒度文本特征表示的微博用户兴趣识别方法。首先,从主题层、词序层和词汇层3个方面对微博用户构造文本向量,利用LDA提取内容的主题特征,通过LSTM学习内容的语义特征,引入腾讯AI Lab开源词向量获取词义特征;然后,将以上3种特征向量拼接得到的多粒度文本特征表示矩阵输入CNN中,进行文本分类训练;最后,通过多端输出层实现对微博用户的兴趣识别。实验结果表明,多粒度特征表示模型的分类实验结果比单粒度特征表示模型的精准率、召回率和F1值分别提高了8%,12%和13%。基于对文本粗、细语义粒度和词粒度的综合考量,结合神经网络分类算法,多粒度特征表示模型的评价指标均优于单粒度特征表示模型。

关键词: 社交网络, 微博用户, 文本分类, 文本特征, 兴趣识别

Abstract: Microblog user interest discovery is of great significance to the personalized recommendation of social networks and the correct information dissemination guidance.We propose a method of microblog user interest recognition based on multi-granular text feature representation.First,this paper constructs a text vector for microblog users from three aspects,including topic layer,word order layer,and vocabulary layer.LDA is used to extract the content's topic features,and LSTM learns the semantic features of the sentences.The open-source word vector of Tencent AI Lab is introduced to obtain the semantic features of words;then,the multi-granular text feature representative matrix obtained by the above three feature vectors is input into CNN for text classification training.Finally,the interest recognition of Weibo users is completed through the multi-terminal output layer.Experimental results show that the precision rate,recall rate,and F1 value of the multi-granularity feature representation model are improved by 8%,12%,and 13%,respectively.Based on the careful consideration of text coarse and fine semantic granularity and word granularity,combined with the neural network classification algorithm,the multi-granularity feature representation model's evaluation index is better than the single-granularity feature representation model.

Key words: Interest recognition, Social network, Text classification, Text feature, Weibo user

中图分类号: 

  • TP391
[1]WANG X,YU X,ZHOU B,et al.Mining personal interests of microbloggers based on free tags in SINA Weibo[C]//International Conference on Web-Age Information Management.Cham:Springer,2015:79-87.
[2]SHI W J,XU Y B.Research on Discovering Micro-blog User Interests[J].New Technology of Library and Information Ser-vice,2015(1):52-58.
[3]ZHONG Z M,GUAN Y,HU Y,et al.Mining User Interests on Microblog Based on Profile and Content[J].Journal of Software,2017,28(2):278-291.
[4]LIU Z,CHEN X,SUN M.Mining the interests of Chinese microbloggers via keyword extraction[J].Frontiers of Computer Science,2012,6(1):76-87.
[5]WANG W,WU S,ZHANG Q.Content-Based Weibo User In- terest Recognition[M]//LISS2019.Springer,Singapore,2020:685-700.
[6]BLEI D M,NG A Y,JORDAN M I,et al.Latent dirichlet allocation[J/OL].Journal of Machine Learning Research,2003:993-1022.https://dl.acm.org/doi/10.5555/944919.944937.
[7]LIU Q,NIU K,HE Z,et al.Microblog user interest modeling based on feature propagation[C]//2013 Sixth International Symposium on Computational Intelligence and Design.IEEE,2013:383-386.
[8]HE L,JIA Y,HAN W,et al.Mining user interest in microblogs with a user-topic model[J].China Communications,2014,11(8):131-144.
[9]YU J,QIU L.ULW-DMM:An effective topic modeling method for microblog short text[J].IEEE Access,2018,7:884-893.
[10]ZHENG W,GE B,WANG C.Building a TIN-LDA model for mining microblog users' interest[J].IEEE Access,2019,7:21795-21806.
[11]QIU Y F,WANG L Y,SHAO L S,et al.User Interest Modeling Approach Based on Short Text of Microblog[J].Computer Engineering,2014,40(2):275-279.
[12]TANG X B,LIANG M J.Research of Silent User Interest Mo- deling in Microblog Based on the Features of Structure and Content[J].Journal of the China Society for Scientific and Technical Information,2015,34(11):1214-1224.
[13]SONG W,ZHANG Y,XIE Y B,et al.Identifying User Interests based on Microblog Classification[J].Intelligent Computer and Applications,2013,3(4):80-83.
[14]DU Y M,ZHANG W N,LIU T.User interest recognition based on topic enhanced convolution neural network[J].Journal of Computer Research and Development,2018,55(1):188-197.
[15] KIM Y.Cnvolutional neural networks for sentence classification[J/OL].Eprint Arxiv,2014.https://arXiv.org/abs/1408.5882.
[16]ZENG J,LU W,CHEN H H,et al.Research on User Interest Recognition Based on Multi mode Data[J].Information Science,2018,36(1):124-129.
[17]YANG P,LIU J,QI J,et al.Comparison and Modelling of Country-level Microblog User and Activity in Cyber-physical-social Systems Using Weibo and Twitter Data[J].ACM Transactions on Intelligent Systems and Technology(TIST),2019,10(6):1-24.
[18]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me- mory[J].Neural Computation,1997,9(8):1735-1780.
[19]DARLING W M.A theoretical and practical implementation tutorial on topic modeling and gibbs sampling[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies.2011:642-647.
[20]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[21]LI X L,WANG H,LIU X M,et al.Comparing Text Vector Generators for Weibo Short Text Classification[J].Data Analysis and Knowledge Discovery,2018,2(8):41-50.
[22]COLLOBERT R,WESTON J,BOTTOU L,et al.Natural language processing(almost) from scratch[J].Journal of machine learning research,2011,12(ARTICLE):2493-2537.
[1] 王剑, 彭雨琦, 赵宇斐, 杨健.
基于深度学习的社交网络舆情信息抽取方法综述
Survey of Social Network Public Opinion Information Extraction Based on Deep Learning
计算机科学, 2022, 49(8): 279-293. https://doi.org/10.11896/jsjkx.220300099
[2] 郝志荣, 陈龙, 黄嘉成.
面向文本分类的类别区分式通用对抗攻击方法
Class Discriminative Universal Adversarial Attack for Text Classification
计算机科学, 2022, 49(8): 323-329. https://doi.org/10.11896/jsjkx.220200077
[3] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[4] 檀莹莹, 王俊丽, 张超波.
基于图卷积神经网络的文本分类方法研究综述
Review of Text Classification Methods Based on Graph Convolutional Network
计算机科学, 2022, 49(8): 205-216. https://doi.org/10.11896/jsjkx.210800064
[5] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[6] 武红鑫, 韩萌, 陈志强, 张喜龙, 李慕航.
监督和半监督学习下的多标签分类综述
Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning
计算机科学, 2022, 49(8): 12-25. https://doi.org/10.11896/jsjkx.210700111
[7] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[8] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[9] 邵欣欣.
TI-FastText自动商品分类算法
TI-FastText Automatic Goods Classification Algorithm
计算机科学, 2022, 49(6A): 206-210. https://doi.org/10.11896/jsjkx.210500089
[10] 魏鹏, 马玉亮, 袁野, 吴安彪.
用户行为驱动的时序影响力最大化问题研究
Study on Temporal Influence Maximization Driven by User Behavior
计算机科学, 2022, 49(6): 119-126. https://doi.org/10.11896/jsjkx.210700145
[11] 邓朝阳, 仲国强, 王栋.
基于注意力门控图神经网络的文本分类
Text Classification Based on Attention Gated Graph Neural Network
计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[12] 余皑欣, 冯秀芳, 孙静宇.
结合物品相似性的社交信任推荐算法
Social Trust Recommendation Algorithm Combining Item Similarity
计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217
[13] 畅雅雯, 杨波, 高玥琳, 黄靖云.
基于SEIR的微信公众号信息传播建模与分析
Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR
计算机科学, 2022, 49(4): 56-66. https://doi.org/10.11896/jsjkx.210900169
[14] 左园林, 龚月姣, 陈伟能.
成本受限条件下的社交网络影响最大化方法
Budget-aware Influence Maximization in Social Networks
计算机科学, 2022, 49(4): 100-109. https://doi.org/10.11896/jsjkx.210300228
[15] 刘硕, 王庚润, 彭建华, 李柯.
基于混合字词特征的中文短文本分类算法
Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words
计算机科学, 2022, 49(4): 282-287. https://doi.org/10.11896/jsjkx.210200027
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!