计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230500169-8.doi: 10.11896/jsjkx.230500169
康智勇, 李弼程, 林煌
KANG Zhiyong, LI Bicheng, LIN Huang
摘要: 社交网络用户兴趣发现对信息过载缓解、个性化推荐和信息传播正向引导等方面具有重要意义。目前已有的兴趣识别研究未能同时考虑文本主题信息及其对应的类别标签信息对模型学习文本特征的帮助,文中提出了一种融入类别标签和主题信息的用户兴趣识别方法。首先,利用BERT预训练模型、BiLSTM模型和多头自注意力机制分别获取文本和标签序列的语义特征;其次,引入标签注意力机制,使模型更加关注文本与其类别标签更相关的词语信息;然后,利用LDA主题模型和Word2Vec模型得到文本主题特征;接着,设计门控机制进行特征融合,使模型能够自适应地融合多种特征,进而实现微博文本兴趣类别分类;最后,统计用户发表的所有文本在各个兴趣类别上的数量,将数量最多的兴趣类别确定为用户兴趣识别结果。为验证所提方法的有效性,文中构建了一个微博兴趣识别数据集。实验结果表明,该模型在微博文本兴趣类别分类和用户兴趣识别任务中均取得了最优性能。
中图分类号:
[1]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(Jan):993-1022. [2]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the57th Conference of the North American Chapter of the Association for Computational Linguistics.Minneapolis:Association for Computational Linguistics.2019:4171-4186. [3]ZHAO Z,CHENG Z,HONG L,et al.Improving user topic interest profiles by behavior factorization[C]//Proceedings of the 24th International Conference on World Wide Web.Florence:ACM,2015:1406-1416. [4]ZHONG Z M,GUAN Y,HU Y,et al.Mining user interests on microblog based on profile and content[J].Journal of Software,2017,28(2):278-291. [5]RAJENDRAN D P D,SUNDARRAJ R P.Using topic modelswith browsing history in hybrid collaborative filtering recommender system:Experiments with user ratings[J].International Journal of Information Management Data Insights,2021,1(2):100027. [6]HE J,LIU H,ZHENG Y,et al.Bi-labeled LDA:Inferring intere-st tags for non-famous users in social network[J].Data Science and Engineering,2020,5:27-47. [7]YU J,QIU L.ULW-DMM:An effective topic modeling method for microblog short text[J].IEEE Access,2018,7:884-893. [8]QIU L,JIA Y.CLDA:An effective topic model for mining user interest preference under big data background[J].Complexity,2018,2018:1-10. [9]ZHENG W,GE B,WANG C.Building a TIN-LDA model formining microblog users’interest[J].IEEE Access,2019,7:21795-21806. [10]KANG J,CHOI H S,LEE H.Deep recurrent convolutional networks for inferring user interests from social media[J].Journal of Intelligent Information Systems,2019,52:191-209. [11]CHO K,VAN M B,GULCEHRE C,et al.Learning phrase re-presentations using RNN encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:EMNLP,2014:1724-1734. [12]KIM Y.Convolutional neural networks for sentence classifica-tion[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:EMNLP,2014:1746-1751. [13]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems.Nevada:Curran Associates Inc,2013:3111-3119. [14]JIANG H,WANG W,WEI Y,et al.What aspect do you like:Multi-scale time-aware user interest modeling for micro-video recommendation[C]//Proceedings of the 28th ACM International conference on Multimedia.New York:ACM,2020:3487-3495. [15]DU Y M,ZHANG W N,LIU T.Topic augumented convolutionalneural network for user interest recognition[J].Journal of Computer Research and Development,2018,55(1):188-197. [16]RAMAGE D,HALL D,NALLAPATI R,et al.Labeled LDA:A supervised topic model for credit attribution in multi-labeled corpora[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing.Singapore:EMNLP,2009:248-256. [17]YU Y Q,LI B C.Microblog user interest recognition based on multi-granularity text Feature representation[J].Computer Science,2021,48(12):219-225. [18]GUO B,HAN S,HAN X,et al.Label confusion learning to enhance text classification models[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:12929-12936. [19]MUELLER A,KRONE J,ROMEO S,et al.Label SemanticAware Pre-training for Few-shot Text Classification[J/OL].Eprint Arxiv,2022.https://arxiv.org/abs/2204.07128. [20]RAFFEL C,SHAZEER N,ROBERTs A,et al.Exploring the limits of transfer learning with a unified text-to-text transformer[J].The Journal of Machine Learning Research,2020,21(1):5485-5551. [21]ZHANG K,WU L,LV G,et al.Description-Enhanced LabelEmbedding Contrastive Learning for Text Classification[J].IEEE Transactions on Neural Networks and Learning Systems,2023:1-14. [22]WANG G Y,LI C Y,WANG W L,et al.Joint embedding of words and labels for text classification[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics.Melbourne:Association for Computational Linguistics,2018:2321-2331. [23]GAONKAR R,KWON H,BASTAN M,et al.Modeling label semantics for predicting emotional reactions[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Dublin:Association for Computational Linguistics,2020:1087-1094. [24]LI F F,SU P Z,DUAN J W,ZHANG S C,et al.Multi-label text classification with enhancing multi-granularity information relations[J/OL].Journa of Software.http://www.jos.org.cn/1000-9825/6802.html. [25]LIU M,LIU L,CAO J,et al.Co-attention network with label embedding for text classification[J].Neurocomputing,2022,471:61-69. [26]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780. [27]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[J].Advances in Neural Information Processing Systems,2017,30:1-11. [28]XIAO L,HUANG X,CHEN B,et al.Label-specific documentrepresentation for multi-label text classification[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP).Hong Kong:Association for Computational Linguistics,2019:466-475. [29]XU Y M,FAN Z W,CAO H.A multi-task text classification model based on label embedding of attention mechanism[J].Data Analysis and Knowledge Discovery,2022,6(2/30):105-116. |
|