计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 275-279.doi: 10.11896/jsjkx.191000174
马晓慧1, 贾君枝2, 周湘贞3, 闫俊伢1
MA Xiao-hui1, JIA Jun-zhi2, ZHOU Xiang-zhen3, YAN Jun-ya1
摘要: 情感词典有助于情感分析,可以通过词语匹配来进行情感分类。但是,情感词典在词汇覆盖和领域适应方面存在一定的局限性。为此,文中提出了一种基于语义相似性度量和嵌入表示的情感分类方法,该方法计算了待分类文本与情感词典之间的语义相似度,将语义距离和基于嵌入的特征结合起来进行情感分类,有助于解决语义特征利用不足的问题。文中分别采用词向量、情感词典匹配和所提方法提取的特征向量来对情感分类性能进行了评估。实验结果表明,所提方法整体上优于对比方法。在3种电商评论测试语料中,所提方法的F1平均值达到了83.46%,相比对比方法提升了8.26%。其中,利用词嵌入与ECSD(E-Commerce Sentiment Dictionary)相结合提取的语义分类效果最佳,性能提升达到了9%,表明通过结合语义相似度可以丰富提取的情感语义特征,能够有效提升情感分类的性能。
中图分类号:
[1] CAMBRIA E,PORIA S,GELBUKH A,et al.Sentiment Analysis Is a Big Suitcase[J].IEEE Intelligent Systems,2017,32(6):74-80. [2] LIU B.Sentiment Analysis:Mining Opinions,Sentiments,andEmotions[M].Cambridge University Press,2015:7-8. [3] TABOADA M,BROOKE J,TOFILOSKI M,et al.Lexicon-based methods for sentiment analysis[J].Computational Linguistics,2011,37(2):267-307. [4] CAMBRIA E,SCHULLER B,XIA Y,et al.New avenues inopinion mining and sentiment analysis[J].IEEE Intelligent Systems,2013,28(2):15-21. [5] DING X,LIU B,YU P S.A holistic lexicon-based approach toopinion mining[C]//Proceedings of the 2008International Conference on Web Search and Data Mining.Palo Alto:ACM,2008:231-240. [6] LE Q,MIKOLOV T.Distributed representations of sentencesand documents[C]//Proceedings of the 31st International Conference on Machine Learning.Beijing:JMLR,2014:1188-1196. [7] ALPAYDIN E.Introduction to Machine Learning[M].London:MIT press,2014:127-130. [8] GAO M Z.Research on Sentiment Classification and OpinionMining Technique of Online Reviews[D].Changsha:National University of Defense Technology,2014. [9] KAMPS J,MARX M,MOKKEN R J,et al.Using Wordnet to Measure Semantic Orientation of Adjectives[C]//Proceedings of the Fourth International Conference on Language Resources and Evaluation.Lisbon:ELRA,2004:1115-1118. [10] GUERINI M,LORENZO G,MARCO T.Sentiment Analysis:How to Derive Prior Polarities from Sentiwordnet[C]//Proceedings of the 2013 Conference of Empirical Methods on Natural Language Processing.Washington:Association for Computational Linguistics,2013:1259-1269. [11] LI C J.Text sentiment polarity analysis based on Chinese reviews in hotel domain[D].Guangzhou:South China University of Technology,2016. [12] HAMILTON W L,CLARK K,LESKOVEC J,et al.Inducingdomain-specific sentiment lexicons from unlabeled corpora[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Austin:Association for Computational Linguistics,2016:595-605. [13] LUO S L,MAO Y Y,PAN L M,et al.A Method of Text Sentiment Classification by Extending Semantic Similar Sentiment Words[J].Transactions of Beijing Institute of Technology,2018,38(11):1156-1162,1176. [14] ZHU G,IGLESIAS C A.Computing semantic similarity of concepts in knowledge graphs[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(1):72-85. [15] GLIGOROV R,TEN KATE W,ALEKSOVSKI Z,et al.Using google distance to weight approximate ontology matches[C]//Proceedings of the 16th International Conference on World Wide Web.New York:ACM,2007:767-776. [16] MIKOLOV T,CORRADO G,CHEN K,et al.Efficient Estimation of Word Representations in Vector Space[C]//Proceedings of the International Conference on Learning Representations.2013:1-12. [17] BUDANITSKY A,HIRST G.Evaluating wordnet-based measures of lexical semantic relatedness[J].Computational Linguistics,2006,32(1):13-47. [18] BENGIO Y,DUCHARME R,VINCENT P.A Neural Probabilistic Language Model[J].Journal of Machine Learning Research,2003,3:1137-1155. [19] GOLDBERG Y.A Primer on Neural Network Models for Natural Language Processing[J].Journal of Artificial Intelligence Research,2016,57:345-420. [20] WANG Y,TAO Y Z,ZHANG Q.Research on sentiment orientation of product feature from Chinese reviews on the internet[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2017,29(1):75-83. [21] YANG W,SONG J J,TANG J Q. A Study on the Classification Approach for Chinese MicroBlog Subjective and Objective Sentences [J].Journal of Chongqing University of Technology(Natural Science),2013,27(1):51-56. [22] PORIA S,CAMBRIA E,GELBUKH A.Aspect Extraction for Opinion Mining with a Deep Convolutional Neural Network[J].Knowledge-Based Systems,2016,108:42-49. [23] SCHNABEL T,LABUTOV I,MIMNO D,et al.Evaluationmethods for unsupervised word embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Lisbon:Association for Computational Linguistics,2015:298-307. [24] ARAQUE O,CORCUERA-PLATAS I,SÁNCHEZ-RADA J F,et al.Enhancing deep learning sentiment analysis with ensemble techniques in social applications[J].Expert Systems with Applications,2017,77:236-246. [25] DAI A M,LE Q V.Semi-supervised sequence learning[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.Montreal:MIT Press Cambridge,2015:3079-3087. [26] KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:Association for Computational Linguistics,2014:1746-1751. [27] RUDER S,GHAFFARI P,BRESLIN J G.INSIGHT-1 at Se-mEval-2016 Task 5:Deep Learning for Multilingual Aspect-based Sentiment Analysis[C]//Proceedings of SemEval-2016.San Diego:2016 Association for Computational Linguistics,2016:330-336. [28] TANG D,WEI F,QIN B,et al.Sentiment embeddings with applications to sentiment analysis[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(2):496-509. [29] YU S W,LU Q,CHEN W L.Fine-grained Opinion MiningBased on Feature Representation of Domain Sentiment Lexicon[J].Journal of Chinese Information Processing,2019,33(2):112-121. |
[1] | 李斌, 万源. 基于相似度矩阵学习和矩阵校正的无监督多视角特征选择 Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment 计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124 |
[2] | 胡艳羽, 赵龙, 董祥军. 一种用于癌症分类的两阶段深度特征选择提取算法 Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification 计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092 |
[3] | 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224 |
[4] | 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩. 混合改进的花授粉算法与灰狼算法用于特征选择 Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection 计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135 |
[5] | 林夕, 陈孜卓, 王中卿. 基于不平衡数据与集成学习的属性级情感分类 Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning 计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205 |
[6] | 储安琪, 丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理 Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation 计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075 |
[7] | 孙林, 黄苗苗, 徐久成. 基于邻域粗糙集和Relief的弱标记特征选择方法 Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief 计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094 |
[8] | 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁. 融合双重权重机制和图卷积神经网络的微博细粒度情感分类 Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network 计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073 |
[9] | 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松. 基于交互注意力图卷积网络的方面情感分类 Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification 计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180 |
[10] | 李宗然, 陈秀宏, 陆赟, 邵政毅. 鲁棒联合稀疏不相关回归 Robust Joint Sparse Uncorrelated Regression 计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034 |
[11] | 李玉强, 张伟江, 黄瑜, 李琳, 刘爱华. 基于高斯分布的改进词嵌入主题情感模型 Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution 计算机科学, 2022, 49(2): 256-264. https://doi.org/10.11896/jsjkx.201200082 |
[12] | 李昭奇, 黎塔. 基于wav2vec预训练的样例关键词识别 Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining 计算机科学, 2022, 49(1): 59-64. https://doi.org/10.11896/jsjkx.210900007 |
[13] | 张叶, 李志华, 王长杰. 基于核密度估计的轻量级物联网异常流量检测方法 Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method 计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108 |
[14] | 罗月童, 汪涛, 杨梦男, 张延孔. 基于历史行车轨迹集的车辆行为可视分析方法 Historical Driving Track Set Based Visual Vehicle Behavior Analytic Method 计算机科学, 2021, 48(9): 86-94. https://doi.org/10.11896/jsjkx.200900040 |
[15] | 杨蕾, 降爱莲, 强彦. 基于自编码器和流形正则的结构保持无监督特征选择 Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization 计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211 |
|