一种用于构建用户画像的二级融合算法框架

doi:10.11896/j.issn.1002-137X.2018.01.027

Abstract

Abstract: User portraits are a kind of tagged user model constructed from user’s social attributes,lifestyle and consu-mer behavior,etc.The core work of building user portraits is to “tag” the user.Based on the user’s query word history,this paper proposed a two-level stacking algorithm framework for predicting user’s multi-dimensional labels.For the first-level models,a variety of models are built on each tag prediction subtask.The SVM model and Trigram feature are used to extract the differences of user’s words habit.The doc2vec shallow neural network model is used to extract the semantic relation information of the query words,and the convolution neural network model is used to extract the deep semantic association information between the query words.Experiments show that doc2vec has relatively good predictive accuracy in dealing with short texts related tasks (such as user queries).For the second-level models,the XGBTree model and the Stacking method are used to extract the association information between the label’s attributes of the user,so that the average prediction accuracy is further improved by 2%.In the big data competition “Sougou User Portrait Mining For Precision Marketing” organizated by China Computer Federation in 2016,this two-level stacking algorithm framework won the championship from 894 teams.

Key words: User portraits,Tag prediction,Short text classification,Multi-model ensemble

LI Heng-chao, LIN Hong-fei, YANG Liang, XU Bo, WEI Xiao-cong, ZHANG Shao-wu and Gulziya ANIWAR. Two-level Stacking Algorithm Framework for Building User Portrait[J].Computer Science, 2018, 45(1): 157-161.

References

[1] PANG B,LEE L.Opinion Mining and Sentiment Analysis[J].Foundations and Trends in Information Retrieval,2008,2(12):1-135.
[2] WANG S I,MANNING C D.Baselines and Bigrams:Simple,Good Sentiment and Topic Classification[C]∥Meeting of the Association for Computational Linguistics.2012:90-94.
[3] BENGIO Y,DUCHARME R,VINCENT P,et al.A neuralprobabilistic language model[J].Journal of Machine Learning Research,2003,3(6):1137-1155.
[4] COLLOBERT R,WESTON J.A unified architecture for natural language processing:deep neural networks with multitask lear-ning[C]∥International Conference on Machine Learning.2008:160-167.
[5] COLLOBERT R,WESTON J,BOTTOU L,et al.Natural Language Processing (Almost) from Scratch[J].Journal of Machine Learning Research,2011,2(1):2493-2537.
[6] HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[7] SUNDERMEYER M,SCHLUTER R,NEY H,et al.LSTMNeural Networks for Language Modeling [C]∥Conference of the International Speech Communication Association.2012:601-608.
[8] SUTSKEVER I,VINYALS O,LE Q V,et al.Sequence to Sequence Learning with Neural Networks[C]∥Advances in Neural Information Processing Systems 27 (NIPS 2014).2014:3104-3112.
[9] CHEN D,MAK B.Multitask learning of deep neural networks for low-resource speech recognition[J].IEEE Transactions on Audio,Speech,and Language Processing,2015,23(7):1172-1183.
[10] BERGER A L,PIETRA V J,PIETRA S A,et al.A maximum entropy approach to natural language processing[J].Computational Linguistics,1996,22(1):39-71.
[11] KIM Y.Convolutional Neural Networks for Sentence Classification[J].Empirical Methods in Natural Language Processing,2014:1746-1751.
[12] KALCHBRENER N,GREFENSTETTE E, BLUNSOM P,et al.A Convolutional Neural Network for Modeling Sentences[C]∥Meeting of the Association for Computational Linguistics.2014:655-665.
[13] HE K,ZHANG X,REN S,et al.Delving Deep into Rectifiers:Surpassing Human-Level Performance on ImageNetClassification[C]∥International Conference on Computer Vision.2015:1026-1034.
[14] JOULIN A,GRAVE E,BOJANOWSKI P,et al.Bag of Tricks for Efficient Text Classification[C]∥Conference of the Euro-pean Chapter of the Association for Computational Linguistics.2017:427-431.
[15] MIKOLOV T,SUTSKEVER I,CHEN K,et al.DistributedRepresentations of Words and Phrases and their Compositiona-lity[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
[16] LE Q V,MIKOLOV T.Distributed Representations of Sen-tences and Documents[C]∥International Conference on Machine Learning.2014:1188-1196.
[17] JAHRER M,LEGENSTEIN R.Combining predictions for accurate recommender systems[C]∥ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2010:693-702.
[18] MESNIL G,MIKOLOV T,RANZATO M,et al.Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews[J].Journal of Lightwave Technology,2014,32(17):3043-3060.
[19] PENNINGTON J,SOCHER R,MANNING C D,et al.Glove:Global Vectors for Word Representation[C]∥Empirical Me-thods in Natural Language Processing.2014:1532-1543.
[20] LIU Y,LIU Z,CHUA T,et al.Topical word embeddings[C]∥National Conference on Artificial Intelligence.2015:2418-2424.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

Comments

Recommended 0

No Suggested Reading articles found!

Two-level Stacking Algorithm Framework for Building User Portrait

PDF (PC)

Abstract

Cite this article

share this article

References

Related Articles 0

Metrics

Comments

Recommended 0