Computer Science ›› 2021, Vol. 48 ›› Issue (12): 219-225.doi: 10.11896/jsjkx.201100128

• Database & Big Data & Data Science • Previous Articles     Next Articles

Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation

YU You-qin, LI Bi-cheng   

  1. College of Computer Science and Technology,Huaqiao University,Xiamen,Fujian 361021,China
  • Received:2020-11-17 Revised:2021-02-18 Online:2021-12-15 Published:2021-11-26
  • About author:YU You-qin,born in 1993,postgra-duate.Her main research interests include user portrait and personalized information recommendation.
    LI Bi-cheng,born in 1970,Ph.D,professor,Ph.D supervisor.His main research interests include text analysis and understanding,information fusion.
  • Supported by:
    National Social Science Foundation of China(19BXW110).

Abstract: Microblog user interest discovery is of great significance to the personalized recommendation of social networks and the correct information dissemination guidance.We propose a method of microblog user interest recognition based on multi-granular text feature representation.First,this paper constructs a text vector for microblog users from three aspects,including topic layer,word order layer,and vocabulary layer.LDA is used to extract the content's topic features,and LSTM learns the semantic features of the sentences.The open-source word vector of Tencent AI Lab is introduced to obtain the semantic features of words;then,the multi-granular text feature representative matrix obtained by the above three feature vectors is input into CNN for text classification training.Finally,the interest recognition of Weibo users is completed through the multi-terminal output layer.Experimental results show that the precision rate,recall rate,and F1 value of the multi-granularity feature representation model are improved by 8%,12%,and 13%,respectively.Based on the careful consideration of text coarse and fine semantic granularity and word granularity,combined with the neural network classification algorithm,the multi-granularity feature representation model's evaluation index is better than the single-granularity feature representation model.

Key words: Interest recognition, Social network, Text classification, Text feature, Weibo user

CLC Number: 

  • TP391
[1]WANG X,YU X,ZHOU B,et al.Mining personal interests of microbloggers based on free tags in SINA Weibo[C]//International Conference on Web-Age Information Management.Cham:Springer,2015:79-87.
[2]SHI W J,XU Y B.Research on Discovering Micro-blog User Interests[J].New Technology of Library and Information Ser-vice,2015(1):52-58.
[3]ZHONG Z M,GUAN Y,HU Y,et al.Mining User Interests on Microblog Based on Profile and Content[J].Journal of Software,2017,28(2):278-291.
[4]LIU Z,CHEN X,SUN M.Mining the interests of Chinese microbloggers via keyword extraction[J].Frontiers of Computer Science,2012,6(1):76-87.
[5]WANG W,WU S,ZHANG Q.Content-Based Weibo User In- terest Recognition[M]//LISS2019.Springer,Singapore,2020:685-700.
[6]BLEI D M,NG A Y,JORDAN M I,et al.Latent dirichlet allocation[J/OL].Journal of Machine Learning Research,2003:993-1022.https://dl.acm.org/doi/10.5555/944919.944937.
[7]LIU Q,NIU K,HE Z,et al.Microblog user interest modeling based on feature propagation[C]//2013 Sixth International Symposium on Computational Intelligence and Design.IEEE,2013:383-386.
[8]HE L,JIA Y,HAN W,et al.Mining user interest in microblogs with a user-topic model[J].China Communications,2014,11(8):131-144.
[9]YU J,QIU L.ULW-DMM:An effective topic modeling method for microblog short text[J].IEEE Access,2018,7:884-893.
[10]ZHENG W,GE B,WANG C.Building a TIN-LDA model for mining microblog users' interest[J].IEEE Access,2019,7:21795-21806.
[11]QIU Y F,WANG L Y,SHAO L S,et al.User Interest Modeling Approach Based on Short Text of Microblog[J].Computer Engineering,2014,40(2):275-279.
[12]TANG X B,LIANG M J.Research of Silent User Interest Mo- deling in Microblog Based on the Features of Structure and Content[J].Journal of the China Society for Scientific and Technical Information,2015,34(11):1214-1224.
[13]SONG W,ZHANG Y,XIE Y B,et al.Identifying User Interests based on Microblog Classification[J].Intelligent Computer and Applications,2013,3(4):80-83.
[14]DU Y M,ZHANG W N,LIU T.User interest recognition based on topic enhanced convolution neural network[J].Journal of Computer Research and Development,2018,55(1):188-197.
[15] KIM Y.Cnvolutional neural networks for sentence classification[J/OL].Eprint Arxiv,2014.https://arXiv.org/abs/1408.5882.
[16]ZENG J,LU W,CHEN H H,et al.Research on User Interest Recognition Based on Multi mode Data[J].Information Science,2018,36(1):124-129.
[17]YANG P,LIU J,QI J,et al.Comparison and Modelling of Country-level Microblog User and Activity in Cyber-physical-social Systems Using Weibo and Twitter Data[J].ACM Transactions on Intelligent Systems and Technology(TIST),2019,10(6):1-24.
[18]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me- mory[J].Neural Computation,1997,9(8):1735-1780.
[19]DARLING W M.A theoretical and practical implementation tutorial on topic modeling and gibbs sampling[C]//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies.2011:642-647.
[20]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[21]LI X L,WANG H,LIU X M,et al.Comparing Text Vector Generators for Weibo Short Text Classification[J].Data Analysis and Knowledge Discovery,2018,2(8):41-50.
[22]COLLOBERT R,WESTON J,BOTTOU L,et al.Natural language processing(almost) from scratch[J].Journal of machine learning research,2011,12(ARTICLE):2493-2537.
[1] TAN Ying-ying, WANG Jun-li, ZHANG Chao-bo. Review of Text Classification Methods Based on Graph Convolutional Network [J]. Computer Science, 2022, 49(8): 205-216.
[2] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[3] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[4] HAO Zhi-rong, CHEN Long, HUANG Jia-cheng. Class Discriminative Universal Adversarial Attack for Text Classification [J]. Computer Science, 2022, 49(8): 323-329.
[5] JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335.
[6] WU Hong-xin, HAN Meng, CHEN Zhi-qiang, ZHANG Xi-long, LI Mu-hang. Survey of Multi-label Classification Based on Supervised and Semi-supervised Learning [J]. Computer Science, 2022, 49(8): 12-25.
[7] DENG Kai, YANG Pin, LI Yi-zhou, YANG Xing, ZENG Fan-rui, ZHANG Zhen-yu. Fast and Transmissible Domain Knowledge Graph Construction Method [J]. Computer Science, 2022, 49(6A): 100-108.
[8] KANG Yan, WU Zhi-wei, KOU Yong-qi, ZHANG Lan, XIE Si-yu, LI Hao. Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution [J]. Computer Science, 2022, 49(6A): 150-158.
[9] SHAO Xin-xin. TI-FastText Automatic Goods Classification Algorithm [J]. Computer Science, 2022, 49(6A): 206-210.
[10] WEI Peng, MA Yu-liang, YUAN Ye, WU An-biao. Study on Temporal Influence Maximization Driven by User Behavior [J]. Computer Science, 2022, 49(6): 119-126.
[11] DENG Zhao-yang, ZHONG Guo-qiang, WANG Dong. Text Classification Based on Attention Gated Graph Neural Network [J]. Computer Science, 2022, 49(6): 326-334.
[12] YU Ai-xin, FENG Xiu-fang, SUN Jing-yu. Social Trust Recommendation Algorithm Combining Item Similarity [J]. Computer Science, 2022, 49(5): 144-151.
[13] CHANG Ya-wen, YANG Bo, GAO Yue-lin, HUANG Jing-yun. Modeling and Analysis of WeChat Official Account Information Dissemination Based on SEIR [J]. Computer Science, 2022, 49(4): 56-66.
[14] ZUO Yuan-lin, GONG Yue-jiao, CHEN Wei-neng. Budget-aware Influence Maximization in Social Networks [J]. Computer Science, 2022, 49(4): 100-109.
[15] LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke. Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words [J]. Computer Science, 2022, 49(4): 282-287.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!