Computer Science ›› 2022, Vol. 49 ›› Issue (2): 256-264.doi: 10.11896/jsjkx.201200082

• Artificial Intelligence • Previous Articles     Next Articles

Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution

LI Yu-qiang1, ZHANG Wei-jiang1, HUANG Yu1, LI Lin1, LIU Ai-hua2   

  1. 1 School of Computer Science and Technology,Wuhan University of Technology,Wuhan 430063,China
    2 School of Energy and Power Engineering,Wuhan University of Technology,Wuhan 430063,China
  • Received:2020-12-08 Revised:2021-03-13 Online:2022-02-15 Published:2022-02-23
  • About author:LI Yu-qiang,born in 1977,Ph.D,asso-ciate professor,master tutor.His main research interests include machine learning and big data analysis.
    ZHANG Wei-jiang,born in 1994,postgraduate.His main research interests include machine learning and big data analysis.
  • Supported by:
    National Social Science Foundation of China(15BGL048).

Abstract: In recent years,the topic sentiment model as an important research in the field of unsupervised learning,has been used in text topic mining and sentiment analysis.However,Weibo has brought some challenges to the topic sentiment model because of its short text and in complete structure.Therefore,the related research and improvement work of this paper will be carried out around the topic sentiment model of Weibo.We introduce the word vector technology to the popular model-TSMMF(topic sentiment model based on multi-feature fusion),use multivariate Gaussian distribution to sample neighboring words fast from the word embedding space,and replace the words generated by the Dirichlet multinomial distribution.Thus,the words with lowcooccurrence frequency and less information will be transformed into words with prominent topic and clear information.At the same time,the nearest neighbor search algorithm is used to further improve the running speed of the model when processing large-scale Weibo corpus,and then the GWE-TSMMF model is proposed.The experimental results show that the average F1 value of GWE-TSMMF model is about 0.718.The sentiment polarity analysis is better than the original model and the existing mainstream word embedding topic sentiment models (WS-TSWE and HST-SCW).

Key words: Gaussian distribution, Topic sentiment model, Weibo sentiment polarity analysis, Word embedding

CLC Number: 

  • TP391
[1]ZHANG S,WEI Z,WANG Y,et al.Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary[J].Future Generation Computer Systems,2018,81:395-403.
[2]WANG Y.Iteration-based naive Bayes sentiment classificationof microblog multimedia posts considering emoticon attributes[J].Multimedia Tools and Applications,2020,79:19151-19166.
[3]PANG B,LEE L.Opinion mining and sentiment analysis[J].Foundations and Trends in Information Retrieval,2008,2(1/2):1-135.
[4]DERMOUCHE M,KOUAS L,VELCIN J,et al.A joint model for topic-sentiment modeling from text[C]//Proceedings of the 30th Annual ACM Symposium on Applied Computing.Salamanca:ACM,2015:819-824.
[5]HUANG F L,YU G,ZHANG J L,et al.Weibo Topic SentimentMining Based on Social Relationship[J].Journal of Software,2017,28(3):694-707.
[6]MIKOLOV T,SUTSKEVER I,CHEN K,et al.DistributedRepresentations of Words and Phrases and their Compositiona-lity[J].Advances in Neural Information Processing Systems,2013,26:3111-3119.
[7]YUAN T T,YANG W Z,ZHONG L J,et al.PLSTM,a perso-nality-based sentiment analysis model for microblogs[J].Computer Application Research,2019,37(2):1-6.
[8]ZHANG X J,LU X Q,ZHOU Q.Research on multi-level diffe-rences in written texts based on word embedding [J].Computer Engineering and Applications,2019,23(55):142-149.
[9]GAO M X,JING W.Chinese short text classification method based on Word2Vec word model[J].Journal of Shandong University (Engineering Science Edition),2019,49(2):34-41.
[10]CHENG J P,WANG Z Y,WEN J R,et al.Contextual Text Understanding in Distributional Semantic Space[C]//Proceedings of the Conference on Information and Knowledge Management.New York:ACM,2015:133-142.
[11]SUN F,GUO J F,LAN Y Y,et al.Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations[C]//Proceedings of the Meeting of the Association for Computational Linguistics.Beijing:ACL,2015:136-145.
[12]LIU Y,LIU Z,CHUA T S,et al.Topical word embeddings[C]//Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence.San Francisco:AAAI Press,2015:2418-2424.
[13]LI S H,CHUA T S,ZHU J,et al.Generative Topic Embedding:a Continuous Representation of Documents[C]//Proceedings of the Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:666-675.
[14]QIANG J,CHEN P,WANG T,et al.Topic Modeling over Short Texts by Incorporating Word Embeddings[J].PAKDD,2017,10235:363-374.
[15]NGUYEN D Q,BILLINGSLEY R,DU L,et al.Improving topic models with latent feature word representations[J].Transactions of the Association for Computational Linguistics,2015,3:299-313.
[16]DAS R,ZAHEER M,DYER C.Gaussian LDA for Topic Models with Word Embeddings[C]//Proceedings of the Meeting of the Association for Computational Linguistics.Beijing:ACL,2015:795-804.
[17]YANG Z,TANG J,COHEN W.Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence.New York:IJCAI,2016:2287-2293.
[18]STEFAN B,KRESTEL R.WELDA:Enhancing topic models by incorporating local word context[C]//Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries.New York:JCDL,2018:293-302.
[19]HUA S W,ZHANG Y H.Short Text Comment SentimentAnalysis of Improved Topic Models[J].Computer Systems & Applications,2019,28(3):255-259.
[20]FU X,SUN X,WU H,et al.Weakly supervised topic sentiment joint model with word embeddings[J].Knowledge-Based Systems,2018,147:43-54.
[21]XU K.Research of topic model-based approaches for sentiment and topic modeling on texts[D].Nanjing:Southeast University,2017.
[22]SILPA-ANAN C,HARTLEY R.Optimised KD-trees for fast image descriptor matching[C]//Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.Anchorage:IEEE,2008:1-8.
[23]WU C,ZHU J,ZHANG J,et al.A Convolutional Treelets Binary Feature Approach to Fast Keypoint Recognition[C]//Proceedings of European Conference on Computer Vision.Berlin:Springer,2012:368-382.
[24]HU L J,NOOSHABADI S.High-dimensional image descriptor matching using highly parallel KD-tree construction and appro-ximate nearest neighbor search[J].Journal of Parallel Distributed Computing,2019,132:127-140.
[25]ADITYA B,MAHESHAKYA W.Distributed Clustering viaLSH Based Data Partitioning[C]//Proceedings of the 35th International Conference on Machine Learing.Stockholm:PMLR,2018:569-578.
[26]FENG X K,CUI J T,LI H,et al.An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors[J].Multimedia Tools and Applications,2019,78(17):24407-24429.
[27]MIMNO D,WALLACH H M,TALLEY E,et al.Optimizing semantic coherence in topic models[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Proces-sing.Edinburgh:EMNLP,2011:262-272.
[28]HUANG F L,FENG S,WANG D L,et al.Mining Topic Sentiment in Microblogging Based on Multi-feature Fusion[J].Chinese Journal of Computers,2017,40(4):872-888.
[29]HE Y X,SUN S T,NIU F F,et al.A deep learning modelenhanced with emotion semantics for microblog sentiment analysis[J].Chinese Journal of computers,2017,40(4):773-790.
[1] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[2] LIANG Yi-wen, DU Yu-song. Timing Attack Resilient Sampling Algorithms for Binary Gaussian Based on Knuth-Yao [J]. Computer Science, 2022, 49(6A): 485-489.
[3] HAN Hong-qi, RAN Ya-xin, ZHANG Yun-liang, GUI Jie, GAO Xiong, YI Meng-lin. Study on Cross-media Information Retrieval Based on Common Subspace Classification Learning [J]. Computer Science, 2022, 49(5): 33-42.
[4] LIU Kai, ZHANG Hong-jun, CHEN Fei-qiong. Name Entity Recognition for Military Based on Domain Adaptive Embedding [J]. Computer Science, 2022, 49(1): 292-297.
[5] LI Zhao-qi, LI Ta. Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining [J]. Computer Science, 2022, 49(1): 59-64.
[6] YU Sheng, LI Bin, SUN Xiao-bing, BO Li-li, ZHOU Cheng. Approach for Knowledge-driven Similar Bug Report Recommendation [J]. Computer Science, 2021, 48(5): 91-98.
[7] ZHANG Yu-shuai, ZHAO Huan, LI Bo. Semantic Slot Filling Based on BERT and BiLSTM [J]. Computer Science, 2021, 48(1): 247-252.
[8] TIAN Ye, SHOU Li-dan, CHEN Ke, LUO Xin-yuan, CHEN Gang. Natural Language Interface for Databases with Content-based Table Column Embeddings [J]. Computer Science, 2020, 47(9): 60-66.
[9] CHENG Jing, LIU Na-na, MIN Ke-rui, KANG Yu, WANG Xin, ZHOU Yang-fan. Word Embedding Optimization for Low-frequency Words with Applications in Short-text Classification [J]. Computer Science, 2020, 47(8): 255-260.
[10] LI Zhou-jun,FAN Yu,WU Xian-jie. Survey of Natural Language Processing Pre-training Techniques [J]. Computer Science, 2020, 47(3): 162-173.
[11] GU Xue-mei,LIU Jia-yong,CHENG Peng-sen,HE Xiang. Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model [J]. Computer Science, 2020, 47(2): 245-250.
[12] HUO Dan, ZHANG Sheng-jie, WAN Lu-jun. Context-based Emotional Word Vector Hybrid Model [J]. Computer Science, 2020, 47(11A): 28-34.
[13] XU Sheng, ZHU Yong-xin. Study on Question Processing Algorithms in Visual Question Answering [J]. Computer Science, 2020, 47(11): 226-230.
[14] MA Xiao-hui, JIA Jun-zhi, ZHOU Xiang-zhen, YAN Jun-ya. Semantic Similarity-based Method for Sentiment Classification [J]. Computer Science, 2020, 47(11): 275-279.
[15] YANG Dan-hao,WU Yue-xin,FAN Chun-xiao. Chinese Short Text Keyphrase Extraction Model Based on Attention [J]. Computer Science, 2020, 47(1): 193-198.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!