计算机科学 ›› 2018, Vol. 45 ›› Issue (1): 157-161.doi: 10.11896/j.issn.1002-137X.2018.01.027
李恒超,林鸿飞,杨亮,徐博,魏晓聪,张绍武,古丽孜热·艾尼外
LI Heng-chao, LIN Hong-fei, YANG Liang, XU Bo, WEI Xiao-cong, ZHANG Shao-wu and Gulziya ANIWAR
摘要: 用户画像是根据用户社会属性、生活习惯和消费行为等信息而抽象出的一个标签化的用户模型。构建用户画像的核心工作是给用户贴“标签”。基于用户的查询词历史记录,提出一种用于预测用户多维标签的二级融合算法框架。在第一级模型中,分别在各个标签预测子任务上建立多种模型,使用传统机器学习方法与Trigram特征相结合来抽取用户用词习惯的差异,使用doc2vec浅层神经网络模型来抽取查询词的语义关联信息,使用卷积神经网络模型来抽取查询词之间的深层语义关联信息。实验表明,doc2vec在处理用户查询这样的短文本相关任务时有着相对较好的预测准确性。在第二级模型中,针对用户画像这样的多标签预测任务,使用XGBTree模型及Stacking多模型相融合的方法提取出用户各标签属性之间的关联信息,使得平均预测准确率进一步提高了2%左右。在2016年中国计算机学会(CCF)组织的大数据竞赛《大数据精准营销中搜狗用户画像挖掘》中,所提二级融合算法框架在894支队伍中夺得了冠军。
[1] PANG B,LEE L.Opinion Mining and Sentiment Analysis[J].Foundations and Trends in Information Retrieval,2008,2(12):1-135. [2] WANG S I,MANNING C D.Baselines and Bigrams:Simple,Good Sentiment and Topic Classification[C]∥Meeting of the Association for Computational Linguistics.2012:90-94. [3] BENGIO Y,DUCHARME R,VINCENT P,et al.A neuralprobabilistic language model[J].Journal of Machine Learning Research,2003,3(6):1137-1155. [4] COLLOBERT R,WESTON J.A unified architecture for natural language processing:deep neural networks with multitask lear-ning[C]∥International Conference on Machine Learning.2008:160-167. [5] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural Language Processing (Almost) from Scratch[J].Journal of Machine Learning Research,2011,2(1):2493-2537. [6] HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780. [7] SUNDERMEYER M,SCHLUTER R,NEY H,et al.LSTMNeural Networks for Language Modeling [C]∥Conference of the International Speech Communication Association.2012:601-608. [8] SUTSKEVER I,VINYALS O,LE Q V,et al.Sequence to Sequence Learning with Neural Networks[C]∥Advances in Neural Information Processing Systems 27 (NIPS 2014).2014:3104-3112. [9] CHEN D,MAK B.Multitask learning of deep neural networks for low-resource speech recognition[J].IEEE Transactions on Audio,Speech,and Language Processing,2015,23(7):1172-1183. [10] BERGER A L,PIETRA V J,PIETRA S A,et al.A maximum entropy approach to natural language processing[J].Computational Linguistics,1996,22(1):39-71. [11] KIM Y.Convolutional Neural Networks for Sentence Classification[J].Empirical Methods in Natural Language Processing,2014:1746-1751. [12] KALCHBRENER N,GREFENSTETTE E, BLUNSOM P,et al.A Convolutional Neural Network for Modeling Sentences[C]∥Meeting of the Association for Computational Linguistics.2014:655-665. [13] HE K,ZHANG X,REN S,et al.Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNetClassification[C]∥International Conference on Computer Vision.2015:1026-1034. [14] JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of Tricks for Efficient Text Classification[C]∥Conference of the Euro-pean Chapter of the Association for Computational Linguistics.2017:427-431. [15] MIKOLOV T,SUTSKEVER I,CHEN K,et al.DistributedRepresentations of Words and Phrases and their Compositiona-lity[J].Advances in Neural Information Processing Systems,2013,26:3111-3119. [16] LE Q V,MIKOLOV T.Distributed Representations of Sen-tences and Documents[C]∥International Conference on Machine Learning.2014:1188-1196. [17] JAHRER M,LEGENSTEIN R.Combining predictions for accurate recommender systems[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2010:693-702. [18] MESNIL G,MIKOLOV T,RANZATO M,et al.Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews[J].Journal of Lightwave Technology,2014,32(17):3043-3060. [19] PENNINGTON J,SOCHER R,MANNING C D,et al.Glove:Global Vectors for Word Representation[C]∥Empirical Me-thods in Natural Language Processing.2014:1532-1543. [20] LIU Y,LIU Z,CHUA T,et al.Topical word embeddings[C]∥National Conference on Artificial Intelligence.2015:2418-2424. |
No related articles found! |
|