计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230800093-7.doi: 10.11896/jsjkx.230800093
黄玉民, 赵婵婵
HUANG Yumin, ZHAO Chanchan
摘要: 针对如何从海量短视频数据、用户数据、交互数据中快速抽象出精准的用户兴趣的问题,提出了基于主题模型的三级标签用户画像构建方法。基于主题构建方法,将融合的LDA和GSDMM主题模型所获取的视频主题词作为用户兴趣表达向量。首先,搭建了LDA过滤器,通过比对阈值剔除与主题无关的文本信息,缩小文本规模,降低非主要语料对于兴趣表达向量生成的影响。然后,提出结合语义信息和语境信息的特征词权重矩阵的构建方法,使用Bi-GRU神经网络计算词向量的上下文特征,并将其作为语境特征,使用TF-IDF算法计算出的词频权重作为语义特征,结合语境和语义特征扩充特征词含义。最后使用带有兴趣权重分配的GSDMM模型学习特征向量权重矩阵,实现用户兴趣标签生成和用户不同喜好程度影响下的兴趣权重修正。实验结果表明,该方法能够比较完备准确地表征用户画像,优于单一的主题构建方法,并且在聚类效果上表现出色。通过构建完备的用户画像,能够精准把握用户痛点,为后续个性化推荐提供服务。
中图分类号:
[1]ZHAO Y H,LIU F L,LUO L.A Review of User Portrait Research in the Context of Big Data:Knowledge System and Research Prospects[J].Library Science Research,2019(24):13-24. [2]SHAN X H,ZHANG X Y,LIU X Y.Research on User Por-traits Based on Online Reviews-A Case Study of Ctrip Hotel[J].Intelligence Theory and Practice,2018,41(4):99-104,149. [3]WANG L X,SHEN Z,LI Y.Social Q & A community user portrait construction[J].Information theory and practice,2018,41(1):129-134. [4]WANG Q F.Research on Bayesian network in user interestmodel construction[J].Wireless Internet Technology,2016(12):101-102. [5]ZHANG Y.Practical analysis of statistical methods for userportraits in the context of big data[J].Modern Business,2020(6):9-10. [6]WAN J P.Design and implementation of real-time game userportrait system based on big data[D].Beijing:China University of Geosciences,2021. [7]ZHANG H X,SHENG F F,XU P Y,et al.Visualization of po-pulation characteristics based on mobile terminal log data[J].Journal of Software,2016,27(5):1174-1187. [8]COOPER A.The inmates are running the asylum[M].Vieweg+Teubner Verlag,1999. [9]GAO G S.A review of user portrait construction methods[J].DataAnalysis and Knowledge Discovery,2019,3(3):25-35. [10]NIELSEN L.Personas-user focused design[M].London:Sprin-ger,2013. [11]BLYTHE M A,WRIGHT P C.Pastiche scenarios:Fiction as a resource for user centred design[J].Interacting with Computers,2006,18(5):1139-1164. [12]MIDDLETON S E,SHADBOLT N R,DE ROURE D C.Ontological user profiling in recommender systems[J].ACM Tran-sactions on Information Systems(TOIS),2004,22(1):54-88. [13]LEUNG K W T,LEE D L.Deriving concept-based user profiles from search engine logs[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(7):969-982. [14]FENG Y,ZOU B X,XU H Y.Short video recommendationmodel based on video content features and barrage text[J].Journal of Liaoning University(Natural Science Edition),2021,48(2):108-115. [15]HU Q,SHEN J J,JING G H,et al.Service clustering methodbased on describing context feature words and improved GSDMM model[J].Communication Journal,2021,42(8):176-187. [16]ZU X,XIE F.A keyword extraction algorithm based on global and local feature representation[J].Journal of Yunnan University(Natural Science Edition),2023,45(4):825-836. [17]CAI M D,SHEN G H,HUANG Z Q.A semi-supervised learning keyword extraction method without manual labeling[J].Journal of Chinese Computer Systems,2024,45(1):69-74. [18]CHEN L Y,WU T.A short text sentiment analysis methodcombining topic model andself-attention mechanism[J].Fo-reign Electronic Measurement Technology,2021,40(11):18-23. [19]FAN H,LI P F.Research on short text sentiment analysis based on FastText word vector and bidirectional GRU recurrent neural network-Taking Weibo comment text as an example[J].Information Science,2021,39(4):15-22. |
|