计算机科学 ›› 2022, Vol. 49 ›› Issue (2): 256-264.doi: 10.11896/jsjkx.201200082
李玉强1, 张伟江1, 黄瑜1, 李琳1, 刘爱华2
LI Yu-qiang1, ZHANG Wei-jiang1, HUANG Yu1, LI Lin1, LIU Ai-hua2
摘要: 近年来,主题情感联合模型成为了无监督学习领域的一项重要研究内容,在文本主题挖掘和情感分析等方面均有实际应用。然而,在现实场景中,微博因其文字短小、结构不完整等特征,给主题情感联合模型带来了一定的挑战。因此,围绕微博主题情感模型展开相关的研究与改进工作,目前较为流行的主题情感模型——TSMMF模型(Topic Sentiment Model Based on Multi-feature Fusion)中引入了词向量技术,运用多元高斯分布从词向量空间中快速采样邻近词语,并替换掉原Dirichlet多项式分布产生的单词,从而将共现频率低、信息量少的单词转变成突出主题、信息明确的单词,同时使用最近邻搜索算法来进一步提升模型处理大型微博语料库的运行速度,进而提出了GWE-TSMMF模型。对比实验结果表明,GWE-TSMMF模型的平均F1值约为0.718,相比原模型和现有的主流词嵌入主题情感模型(WS-TSWE模型和HST-SCW模型),其微博情感极性的分析效果均有显著提升。
中图分类号:
[1]ZHANG S,WEI Z,WANG Y,et al.Sentiment analysis of Chinese micro-blog text based on extended sentiment dictionary[J].Future Generation Computer Systems,2018,81:395-403. [2]WANG Y.Iteration-based naive Bayes sentiment classificationof microblog multimedia posts considering emoticon attributes[J].Multimedia Tools and Applications,2020,79:19151-19166. [3]PANG B,LEE L.Opinion mining and sentiment analysis[J].Foundations and Trends in Information Retrieval,2008,2(1/2):1-135. [4]DERMOUCHE M,KOUAS L,VELCIN J,et al.A joint model for topic-sentiment modeling from text[C]//Proceedings of the 30th Annual ACM Symposium on Applied Computing.Salamanca:ACM,2015:819-824. [5]HUANG F L,YU G,ZHANG J L,et al.Weibo Topic SentimentMining Based on Social Relationship[J].Journal of Software,2017,28(3):694-707. [6]MIKOLOV T,SUTSKEVER I,CHEN K,et al.DistributedRepresentations of Words and Phrases and their Compositiona-lity[J].Advances in Neural Information Processing Systems,2013,26:3111-3119. [7]YUAN T T,YANG W Z,ZHONG L J,et al.PLSTM,a perso-nality-based sentiment analysis model for microblogs[J].Computer Application Research,2019,37(2):1-6. [8]ZHANG X J,LU X Q,ZHOU Q.Research on multi-level diffe-rences in written texts based on word embedding [J].Computer Engineering and Applications,2019,23(55):142-149. [9]GAO M X,JING W.Chinese short text classification method based on Word2Vec word model[J].Journal of Shandong University (Engineering Science Edition),2019,49(2):34-41. [10]CHENG J P,WANG Z Y,WEN J R,et al.Contextual Text Understanding in Distributional Semantic Space[C]//Proceedings of the Conference on Information and Knowledge Management.New York:ACM,2015:133-142. [11]SUN F,GUO J F,LAN Y Y,et al.Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations[C]//Proceedings of the Meeting of the Association for Computational Linguistics.Beijing:ACL,2015:136-145. [12]LIU Y,LIU Z,CHUA T S,et al.Topical word embeddings[C]//Proceedings of the Twenty-ninth AAAI Conference on Artificial Intelligence.San Francisco:AAAI Press,2015:2418-2424. [13]LI S H,CHUA T S,ZHU J,et al.Generative Topic Embedding:a Continuous Representation of Documents[C]//Proceedings of the Meeting of the Association for Computational Linguistics.Berlin:ACL,2016:666-675. [14]QIANG J,CHEN P,WANG T,et al.Topic Modeling over Short Texts by Incorporating Word Embeddings[J].PAKDD,2017,10235:363-374. [15]NGUYEN D Q,BILLINGSLEY R,DU L,et al.Improving topic models with latent feature word representations[J].Transactions of the Association for Computational Linguistics,2015,3:299-313. [16]DAS R,ZAHEER M,DYER C.Gaussian LDA for Topic Models with Word Embeddings[C]//Proceedings of the Meeting of the Association for Computational Linguistics.Beijing:ACL,2015:795-804. [17]YANG Z,TANG J,COHEN W.Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence.New York:IJCAI,2016:2287-2293. [18]STEFAN B,KRESTEL R.WELDA:Enhancing topic models by incorporating local word context[C]//Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries.New York:JCDL,2018:293-302. [19]HUA S W,ZHANG Y H.Short Text Comment SentimentAnalysis of Improved Topic Models[J].Computer Systems & Applications,2019,28(3):255-259. [20]FU X,SUN X,WU H,et al.Weakly supervised topic sentiment joint model with word embeddings[J].Knowledge-Based Systems,2018,147:43-54. [21]XU K.Research of topic model-based approaches for sentiment and topic modeling on texts[D].Nanjing:Southeast University,2017. [22]SILPA-ANAN C,HARTLEY R.Optimised KD-trees for fast image descriptor matching[C]//Proceedings of IEEE Confe-rence on Computer Vision and Pattern Recognition.Anchorage:IEEE,2008:1-8. [23]WU C,ZHU J,ZHANG J,et al.A Convolutional Treelets Binary Feature Approach to Fast Keypoint Recognition[C]//Proceedings of European Conference on Computer Vision.Berlin:Springer,2012:368-382. [24]HU L J,NOOSHABADI S.High-dimensional image descriptor matching using highly parallel KD-tree construction and appro-ximate nearest neighbor search[J].Journal of Parallel Distributed Computing,2019,132:127-140. [25]ADITYA B,MAHESHAKYA W.Distributed Clustering viaLSH Based Data Partitioning[C]//Proceedings of the 35th International Conference on Machine Learing.Stockholm:PMLR,2018:569-578. [26]FENG X K,CUI J T,LI H,et al.An efficient LSH indexing on discriminative short codes for high-dimensional nearest neighbors[J].Multimedia Tools and Applications,2019,78(17):24407-24429. [27]MIMNO D,WALLACH H M,TALLEY E,et al.Optimizing semantic coherence in topic models[C]//Proceedings of the 2011 Conference on Empirical Methods in Natural Language Proces-sing.Edinburgh:EMNLP,2011:262-272. [28]HUANG F L,FENG S,WANG D L,et al.Mining Topic Sentiment in Microblogging Based on Multi-feature Fusion[J].Chinese Journal of Computers,2017,40(4):872-888. [29]HE Y X,SUN S T,NIU F F,et al.A deep learning modelenhanced with emotion semantics for microblog sentiment analysis[J].Chinese Journal of computers,2017,40(4):773-790. |
[1] | 梁懿雯, 杜育松. 抵御计时攻击的基于Knuth-Yao的二元离散高斯采样算法 Timing Attack Resilient Sampling Algorithms for Binary Gaussian Based on Knuth-Yao 计算机科学, 2022, 49(6A): 485-489. https://doi.org/10.11896/jsjkx.210600017 |
[2] | 李昭奇, 黎塔. 基于wav2vec预训练的样例关键词识别 Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining 计算机科学, 2022, 49(1): 59-64. https://doi.org/10.11896/jsjkx.210900007 |
[3] | 田野, 寿黎但, 陈珂, 骆歆远, 陈刚. 基于字段嵌入的数据库自然语言查询接口 Natural Language Interface for Databases with Content-based Table Column Embeddings 计算机科学, 2020, 47(9): 60-66. https://doi.org/10.11896/jsjkx.190800138 |
[4] | 古雪梅,刘嘉勇,程芃森,何祥. 基于增强BiLSTM-CRF模型的推文恶意软件名称识别 Malware Name Recognition in Tweets Based on Enhanced BiLSTM-CRF Model 计算机科学, 2020, 47(2): 245-250. https://doi.org/10.11896/jsjkx.190500063 |
[5] | 徐胜, 祝永新. 视觉问答中问题处理算法研究 Study on Question Processing Algorithms in Visual Question Answering 计算机科学, 2020, 47(11): 226-230. https://doi.org/10.11896/jsjkx.191200015 |
[6] | 马晓慧, 贾君枝, 周湘贞, 闫俊伢. 一种基于语义相似性的情感分类方法 Semantic Similarity-based Method for Sentiment Classification 计算机科学, 2020, 47(11): 275-279. https://doi.org/10.11896/jsjkx.191000174 |
[7] | 韩旭丽, 曾碧卿, 曾锋, 张敏, 商齐. 基于词嵌入辅助机制的情感分析 Sentiment Analysis Based on Word Embedding Auxiliary Mechanism 计算机科学, 2019, 46(10): 258-264. https://doi.org/10.11896/jsjkx.180901687 |
[8] | 张文博,侯晓荣. 基于高斯分布的大气光估计算法 Estimation Algorithm of Atmospheric Light Based on Gaussian Distribution 计算机科学, 2018, 45(4): 301-305. https://doi.org/10.11896/j.issn.1002-137X.2018.04.051 |
[9] | 刘涛, 周先春, 严锡君. 基于光流特征与高斯LDA的面部表情识别算法 LDA Facial Expression Recognition Algorithm Combining Optical Flow Characteristics with Gaussian 计算机科学, 2018, 45(10): 286-290. https://doi.org/10.11896/j.issn.1002-137X.2018.10.053 |
[10] | 翟俊海,臧立光,张素芳. 随机权分布对极限学习机性能影响的实验研究 Experimental Research on Effects of Random Weight Distributions on Performance of Extreme Learning Machine 计算机科学, 2016, 43(12): 125-129. https://doi.org/10.11896/j.issn.1002-137X.2016.12.022 |
[11] | 袁少锋,王士同. 基于PCA与最大后验概率分类的人脸识别方法 Method of Face Recognition Based on Principal Component Analysis and Maximum a Posteriori Probability Classification 计算机科学, 2014, 41(2): 91-94. |
[12] | 刘刚,梁晓庚,罗绪涛. 基于MAP准则的红外图像小波域比例萎缩降噪和增强算法 Denoising Algorithm of Proportional Shrinkage with Enhancement Based on the MAP Rule in Wavelet Domain for Infrared Image 计算机科学, 2010, 37(4): 274-. |
[13] | . 基于ICA与ViSOM的不完整数据处理 计算机科学, 2007, 34(7): 174-177. |
[14] | 刘洋 李玉山 张大朴. 基于色度畸变和纹理特征的阴影消除方法 计算机科学, 2005, 32(9): 211-214. |
[15] | 彭红毅 朱思铭 蒋春福. 数据挖掘中基于ICA的缺失数据值的估计 计算机科学, 2005, 32(12): 203-205. |
|