计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230500169-8.doi: 10.11896/jsjkx.230500169

• 大数据&数据科学 • 上一篇    下一篇

融入类别标签和主题信息的用户兴趣识别方法

康智勇, 李弼程, 林煌   

  1. 华侨大学计算机科学与技术学院 厦门 361021
  • 发布日期:2024-06-06
  • 通讯作者: 李弼程(lbclm@163.com)
  • 作者简介:(21014083062@stu.hqu.edu.cn)
  • 基金资助:
    装备预研教育部联合基金(8091B022150)

User Interest Recognition Method Incorporating Category Labels and Topic Information

KANG Zhiyong, LI Bicheng, LIN Huang   

  1. College of Computer Science and Technology,Huaqiao University,Xiamen 361021,China
  • Published:2024-06-06
  • About author:KANG Zhiyong,born in 1998,postgra-duate.His main research interests include natural language processing,user portraits and personalized recommendation.
    LI Bicheng,born in 1970,Ph.D,professor,Ph.D supervisor.His main research interests include intelligent information processing,network ideological security,network public opinion monitoring and guidance,big data analysis and mi-ning.
  • Supported by:
    Joint Fund of Equipment Pre-research and Ministry of Education(8091B022150).

摘要: 社交网络用户兴趣发现对信息过载缓解、个性化推荐和信息传播正向引导等方面具有重要意义。目前已有的兴趣识别研究未能同时考虑文本主题信息及其对应的类别标签信息对模型学习文本特征的帮助,文中提出了一种融入类别标签和主题信息的用户兴趣识别方法。首先,利用BERT预训练模型、BiLSTM模型和多头自注意力机制分别获取文本和标签序列的语义特征;其次,引入标签注意力机制,使模型更加关注文本与其类别标签更相关的词语信息;然后,利用LDA主题模型和Word2Vec模型得到文本主题特征;接着,设计门控机制进行特征融合,使模型能够自适应地融合多种特征,进而实现微博文本兴趣类别分类;最后,统计用户发表的所有文本在各个兴趣类别上的数量,将数量最多的兴趣类别确定为用户兴趣识别结果。为验证所提方法的有效性,文中构建了一个微博兴趣识别数据集。实验结果表明,该模型在微博文本兴趣类别分类和用户兴趣识别任务中均取得了最优性能。

关键词: 社交网络, 兴趣识别, 主题模型, 标签注意力机制, 特征融合

Abstract: The discovery of social media user interest is of great significance in information overload alleviation,personalized recommendation,and positive guidance of information dissemination.Existing research of interest recognition fails to consider the help of topic information and corresponding category labels information for model learning text features at the same time.Therefore,a user interest recognition method incorporating category labels and topic information is proposed.Firstly,semantic features of text and label sequences are extracted separately by using the BERT pre-trained model,BiLSTM model,and multi-head self-attention mechanism.Then,a label attention mechanism is introduced to make the model pay more attention to the words related to the text’s corresponding category label.Secondly,text topic features are obtained by using the LDA topic model and Word2Vec model.Subsequently,a gating mechanism is designed for feature fusion to enable the model to adaptively merge multiple features,thereby realizing text interest classification.Finally,the number of texts published by users in each interest category is counted,and the interest category with the highest count is determined as users’ interest recognition results.To verify the effectiveness of the proposed method,a Weibo users’ interest recognition dataset is constructed.Experimental results show that the model achieves optimal performance in Weibo text classification and user interest recognition tasks.

Key words: Social network, Interest recognition, Topic model, Label attention mechanism, Feature fusion

中图分类号: 

  • TP391
[1]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(Jan):993-1022.
[2]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the57th Conference of the North American Chapter of the Association for Computational Linguistics.Minneapolis:Association for Computational Linguistics.2019:4171-4186.
[3]ZHAO Z,CHENG Z,HONG L,et al.Improving user topic interest profiles by behavior factorization[C]//Proceedings of the 24th International Conference on World Wide Web.Florence:ACM,2015:1406-1416.
[4]ZHONG Z M,GUAN Y,HU Y,et al.Mining user interests on microblog based on profile and content[J].Journal of Software,2017,28(2):278-291.
[5]RAJENDRAN D P D,SUNDARRAJ R P.Using topic modelswith browsing history in hybrid collaborative filtering recommender system:Experiments with user ratings[J].International Journal of Information Management Data Insights,2021,1(2):100027.
[6]HE J,LIU H,ZHENG Y,et al.Bi-labeled LDA:Inferring intere-st tags for non-famous users in social network[J].Data Science and Engineering,2020,5:27-47.
[7]YU J,QIU L.ULW-DMM:An effective topic modeling method for microblog short text[J].IEEE Access,2018,7:884-893.
[8]QIU L,JIA Y.CLDA:An effective topic model for mining user interest preference under big data background[J].Complexity,2018,2018:1-10.
[9]ZHENG W,GE B,WANG C.Building a TIN-LDA model formining microblog users’interest[J].IEEE Access,2019,7:21795-21806.
[10]KANG J,CHOI H S,LEE H.Deep recurrent convolutional networks for inferring user interests from social media[J].Journal of Intelligent Information Systems,2019,52:191-209.
[11]CHO K,VAN M B,GULCEHRE C,et al.Learning phrase re-presentations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:EMNLP,2014:1724-1734.
[12]KIM Y.Convolutional neural networks for sentence classifica-tion[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:EMNLP,2014:1746-1751.
[13]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.Nevada:Curran Associates Inc,2013:3111-3119.
[14]JIANG H,WANG W,WEI Y,et al.What aspect do you like:Multi-scale time-aware user interest modeling for micro-video recommendation[C]//Proceedings of the 28th ACM International conference on Multimedia.New York:ACM,2020:3487-3495.
[15]DU Y M,ZHANG W N,LIU T.Topic augumented convolutionalneural network for user interest recognition[J].Journal of Computer Research and Development,2018,55(1):188-197.
[16]RAMAGE D,HALL D,NALLAPATI R,et al.Labeled LDA:A supervised topic model for credit attribution in multi-labeled corpora[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.Singapore:EMNLP,2009:248-256.
[17]YU Y Q,LI B C.Microblog user interest recognition based on multi-granularity text Feature representation[J].Computer Science,2021,48(12):219-225.
[18]GUO B,HAN S,HAN X,et al.Label confusion learning to enhance text classification models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:12929-12936.
[19]MUELLER A,KRONE J,ROMEO S,et al.Label SemanticAware Pre-training for Few-shot Text Classification[J/OL].Eprint Arxiv,2022.https://arxiv.org/abs/2204.07128.
[20]RAFFEL C,SHAZEER N,ROBERTs A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].The Journal of Machine Learning Research,2020,21(1):5485-5551.
[21]ZHANG K,WU L,LV G,et al.Description-Enhanced LabelEmbedding Contrastive Learning for Text Classification[J].IEEE Transactions on Neural Networks and Learning Systems,2023:1-14.
[22]WANG G Y,LI C Y,WANG W L,et al.Joint embedding of words and labels for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Melbourne:Association for Computational Linguistics,2018:2321-2331.
[23]GAONKAR R,KWON H,BASTAN M,et al.Modeling label semantics for predicting emotional reactions[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Dublin:Association for Computational Linguistics,2020:1087-1094.
[24]LI F F,SU P Z,DUAN J W,ZHANG S C,et al.Multi-label text classification with enhancing multi-granularity information relations[J/OL].Journa of Software.http://www.jos.org.cn/1000-9825/6802.html.
[25]LIU M,LIU L,CAO J,et al.Co-attention network with label embedding for text classification[J].Neurocomputing,2022,471:61-69.
[26]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[27]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in Neural Information Processing Systems,2017,30:1-11.
[28]XIAO L,HUANG X,CHEN B,et al.Label-specific documentrepresentation for multi-label text classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Hong Kong:Association for Computational Linguistics,2019:466-475.
[29]XU Y M,FAN Z W,CAO H.A multi-task text classification model based on label embedding of attention mechanism[J].Data Analysis and Knowledge Discovery,2022,6(2/30):105-116.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!