计算机科学 ›› 2020, Vol. 47 ›› Issue (11): 275-279.doi: 10.11896/jsjkx.191000174

• 人工智能 • 上一篇    下一篇

一种基于语义相似性的情感分类方法

马晓慧1, 贾君枝2, 周湘贞3, 闫俊伢1   

  1. 1 山西大学商务学院信息学院 太原 030031
    2 中国人民大学信息资源管理学院 北京 100872
    3 中国社会科学院财经战略研究院 北京 100028
  • 收稿日期:2019-10-27 修回日期:2019-12-19 出版日期:2020-11-15 发布日期:2020-11-05
  • 通讯作者: 马晓慧(mxh1112@163.com)
  • 基金资助:
    山西省科技厅重点研发计划项目(201603D321112);山西省教育科学“十三五”规划基金项目(GH-17097);2017国家自然科学基金青年基金项目(61702026);河南省2018年度科技攻关项目(182102110277)

Semantic Similarity-based Method for Sentiment Classification

MA Xiao-hui1, JIA Jun-zhi2, ZHOU Xiang-zhen3, YAN Jun-ya1   

  1. 1 Information Faculty,Business College of Shanxi University,Taiyuan 030031,China
    2 School of Information Resource Management,Renmin University of China,Beijing 100872,China
    3 National Academy of Economic Strategy,Chinese Academy of Social Sciences,Beijing 100028,China
  • Received:2019-10-27 Revised:2019-12-19 Online:2020-11-15 Published:2020-11-05
  • About author:MA Xiao-hui,born in 1982,master,associate professor.Her main research interests include information retrieval,computer application technology and sentiment analysis.
  • Supported by:
    This work was supported by the Key Program of Shanxi Provincial Department of Science and Technology (201603D321112),13th Five-year Plan of Shanxi Provincial Education Department (GH-17097),Young Scientists Fund of the National Natural Science Foundation of China (61702026) and 2018 Science and Technology Research Project of Henan Province (182102110277).

摘要: 情感词典有助于情感分析,可以通过词语匹配来进行情感分类。但是,情感词典在词汇覆盖和领域适应方面存在一定的局限性。为此,文中提出了一种基于语义相似性度量和嵌入表示的情感分类方法,该方法计算了待分类文本与情感词典之间的语义相似度,将语义距离和基于嵌入的特征结合起来进行情感分类,有助于解决语义特征利用不足的问题。文中分别采用词向量、情感词典匹配和所提方法提取的特征向量来对情感分类性能进行了评估。实验结果表明,所提方法整体上优于对比方法。在3种电商评论测试语料中,所提方法的F1平均值达到了83.46%,相比对比方法提升了8.26%。其中,利用词嵌入与ECSD(E-Commerce Sentiment Dictionary)相结合提取的语义分类效果最佳,性能提升达到了9%,表明通过结合语义相似度可以丰富提取的情感语义特征,能够有效提升情感分类的性能。

关键词: 词嵌入, 情感词典, 情感分类, 特征选择, 语义相似

Abstract: The sentiment lexicon is helpful for sentiment analysis and can be used to classify sentiment by word matching.However,sentiment lexicon has some limitations in terms of vocabulary coverage and domain adaptation.Therefore,this paper proposes a sentiment classification method based on semantic similarity measurement and embedding representation,which calculates the semantic similarity between the text to be classified and the sentiment lexicon,and combines semantic distance and embedding-based features to classify sentiment,so it is helpful to solve the problem of insufficient use of semantic features.In this paper,the performance of sentiment classification is evaluated by the feature vector extraction from word vectors,sentiment lexicon matching and the proposed method.Experimental results show that this method is better than the comparison method.In the corpus of three e-commerce comment tests,the average F1 value of the proposed method reaches 83.46%,an increase of 8.26% compared with the comparison method.Among them,semantic classification extracted by combining word embedding and ECSD(E-Commerce Sentiment Dictionary) has the best effect,with a performance improvement of 9%,indicating that the extracted emotional semantic features can be enriched by combining semantic similarity,and the performance of emotional classification can be effectively improved.

Key words: Feature selection, Semantic similarity, Sentiment classification, Sentiment lexicon, Word embedding

中图分类号: 

  • TP391
[1] CAMBRIA E,PORIA S,GELBUKH A,et al.Sentiment Analysis Is a Big Suitcase[J].IEEE Intelligent Systems,2017,32(6):74-80.
[2] LIU B.Sentiment Analysis:Mining Opinions,Sentiments,andEmotions[M].Cambridge University Press,2015:7-8.
[3] TABOADA M,BROOKE J,TOFILOSKI M,et al.Lexicon-based methods for sentiment analysis[J].Computational Linguistics,2011,37(2):267-307.
[4] CAMBRIA E,SCHULLER B,XIA Y,et al.New avenues inopinion mining and sentiment analysis[J].IEEE Intelligent Systems,2013,28(2):15-21.
[5] DING X,LIU B,YU P S.A holistic lexicon-based approach toopinion mining[C]//Proceedings of the 2008International Conference on Web Search and Data Mining.Palo Alto:ACM,2008:231-240.
[6] LE Q,MIKOLOV T.Distributed representations of sentencesand documents[C]//Proceedings of the 31st International Conference on Machine Learning.Beijing:JMLR,2014:1188-1196.
[7] ALPAYDIN E.Introduction to Machine Learning[M].London:MIT press,2014:127-130.
[8] GAO M Z.Research on Sentiment Classification and OpinionMining Technique of Online Reviews[D].Changsha:National University of Defense Technology,2014.
[9] KAMPS J,MARX M,MOKKEN R J,et al.Using Wordnet to Measure Semantic Orientation of Adjectives[C]//Proceedings of the Fourth International Conference on Language Resources and Evaluation.Lisbon:ELRA,2004:1115-1118.
[10] GUERINI M,LORENZO G,MARCO T.Sentiment Analysis:How to Derive Prior Polarities from Sentiwordnet[C]//Proceedings of the 2013 Conference of Empirical Methods on Natural Language Processing.Washington:Association for Computational Linguistics,2013:1259-1269.
[11] LI C J.Text sentiment polarity analysis based on Chinese reviews in hotel domain[D].Guangzhou:South China University of Technology,2016.
[12] HAMILTON W L,CLARK K,LESKOVEC J,et al.Inducingdomain-specific sentiment lexicons from unlabeled corpora[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Austin:Association for Computational Linguistics,2016:595-605.
[13] LUO S L,MAO Y Y,PAN L M,et al.A Method of Text Sentiment Classification by Extending Semantic Similar Sentiment Words[J].Transactions of Beijing Institute of Technology,2018,38(11):1156-1162,1176.
[14] ZHU G,IGLESIAS C A.Computing semantic similarity of concepts in knowledge graphs[J].IEEE Transactions on Knowledge and Data Engineering,2017,29(1):72-85.
[15] GLIGOROV R,TEN KATE W,ALEKSOVSKI Z,et al.Using google distance to weight approximate ontology matches[C]//Proceedings of the 16th International Conference on World Wide Web.New York:ACM,2007:767-776.
[16] MIKOLOV T,CORRADO G,CHEN K,et al.Efficient Estimation of Word Representations in Vector Space[C]//Proceedings of the International Conference on Learning Representations.2013:1-12.
[17] BUDANITSKY A,HIRST G.Evaluating wordnet-based measures of lexical semantic relatedness[J].Computational Linguistics,2006,32(1):13-47.
[18] BENGIO Y,DUCHARME R,VINCENT P.A Neural Probabilistic Language Model[J].Journal of Machine Learning Research,2003,3:1137-1155.
[19] GOLDBERG Y.A Primer on Neural Network Models for Natural Language Processing[J].Journal of Artificial Intelligence Research,2016,57:345-420.
[20] WANG Y,TAO Y Z,ZHANG Q.Research on sentiment orientation of product feature from Chinese reviews on the internet[J].Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition),2017,29(1):75-83.
[21] YANG W,SONG J J,TANG J Q. A Study on the Classification Approach for Chinese MicroBlog Subjective and Objective Sentences [J].Journal of Chongqing University of Technology(Natural Science),2013,27(1):51-56.
[22] PORIA S,CAMBRIA E,GELBUKH A.Aspect Extraction for Opinion Mining with a Deep Convolutional Neural Network[J].Knowledge-Based Systems,2016,108:42-49.
[23] SCHNABEL T,LABUTOV I,MIMNO D,et al.Evaluationmethods for unsupervised word embeddings[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing.Lisbon:Association for Computational Linguistics,2015:298-307.
[24] ARAQUE O,CORCUERA-PLATAS I,SÁNCHEZ-RADA J F,et al.Enhancing deep learning sentiment analysis with ensemble techniques in social applications[J].Expert Systems with Applications,2017,77:236-246.
[25] DAI A M,LE Q V.Semi-supervised sequence learning[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.Montreal:MIT Press Cambridge,2015:3079-3087.
[26] KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha:Association for Computational Linguistics,2014:1746-1751.
[27] RUDER S,GHAFFARI P,BRESLIN J G.INSIGHT-1 at Se-mEval-2016 Task 5:Deep Learning for Multilingual Aspect-based Sentiment Analysis[C]//Proceedings of SemEval-2016.San Diego:2016 Association for Computational Linguistics,2016:330-336.
[28] TANG D,WEI F,QIN B,et al.Sentiment embeddings with applications to sentiment analysis[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(2):496-509.
[29] YU S W,LU Q,CHEN W L.Fine-grained Opinion MiningBased on Feature Representation of Domain Sentiment Lexicon[J].Journal of Chinese Information Processing,2019,33(2):112-121.
[1] 李斌, 万源.
基于相似度矩阵学习和矩阵校正的无监督多视角特征选择
Unsupervised Multi-view Feature Selection Based on Similarity Matrix Learning and Matrix Alignment
计算机科学, 2022, 49(8): 86-96. https://doi.org/10.11896/jsjkx.210700124
[2] 胡艳羽, 赵龙, 董祥军.
一种用于癌症分类的两阶段深度特征选择提取算法
Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification
计算机科学, 2022, 49(7): 73-78. https://doi.org/10.11896/jsjkx.210500092
[3] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[4] 康雁, 王海宁, 陶柳, 杨海潇, 杨学昆, 王飞, 李浩.
混合改进的花授粉算法与灰狼算法用于特征选择
Hybrid Improved Flower Pollination Algorithm and Gray Wolf Algorithm for Feature Selection
计算机科学, 2022, 49(6A): 125-132. https://doi.org/10.11896/jsjkx.210600135
[5] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[6] 储安琪, 丁志军.
基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理
Application of Gray Wolf Optimization Algorithm on Synchronous Processing of Sample Equalization and Feature Selection in Credit Evaluation
计算机科学, 2022, 49(4): 134-139. https://doi.org/10.11896/jsjkx.210300075
[7] 孙林, 黄苗苗, 徐久成.
基于邻域粗糙集和Relief的弱标记特征选择方法
Weak Label Feature Selection Method Based on Neighborhood Rough Sets and Relief
计算机科学, 2022, 49(4): 152-160. https://doi.org/10.11896/jsjkx.210300094
[8] 李浩, 张兰, 杨兵, 杨海潇, 寇勇奇, 王飞, 康雁.
融合双重权重机制和图卷积神经网络的微博细粒度情感分类
Fine-grained Sentiment Classification of Chinese Microblogs Combining Dual Weight Mechanismand Graph Convolutional Neural Network
计算机科学, 2022, 49(3): 246-254. https://doi.org/10.11896/jsjkx.201200073
[9] 潘志豪, 曾碧, 廖文雄, 魏鹏飞, 文松.
基于交互注意力图卷积网络的方面情感分类
Interactive Attention Graph Convolutional Networks for Aspect-based Sentiment Classification
计算机科学, 2022, 49(3): 294-300. https://doi.org/10.11896/jsjkx.210100180
[10] 李宗然, 陈秀宏, 陆赟, 邵政毅.
鲁棒联合稀疏不相关回归
Robust Joint Sparse Uncorrelated Regression
计算机科学, 2022, 49(2): 191-197. https://doi.org/10.11896/jsjkx.210300034
[11] 李玉强, 张伟江, 黄瑜, 李琳, 刘爱华.
基于高斯分布的改进词嵌入主题情感模型
Improved Topic Sentiment Model with Word Embedding Based on Gaussian Distribution
计算机科学, 2022, 49(2): 256-264. https://doi.org/10.11896/jsjkx.201200082
[12] 李昭奇, 黎塔.
基于wav2vec预训练的样例关键词识别
Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining
计算机科学, 2022, 49(1): 59-64. https://doi.org/10.11896/jsjkx.210900007
[13] 张叶, 李志华, 王长杰.
基于核密度估计的轻量级物联网异常流量检测方法
Kernel Density Estimation-based Lightweight IoT Anomaly Traffic Detection Method
计算机科学, 2021, 48(9): 337-344. https://doi.org/10.11896/jsjkx.200600108
[14] 罗月童, 汪涛, 杨梦男, 张延孔.
基于历史行车轨迹集的车辆行为可视分析方法
Historical Driving Track Set Based Visual Vehicle Behavior Analytic Method
计算机科学, 2021, 48(9): 86-94. https://doi.org/10.11896/jsjkx.200900040
[15] 杨蕾, 降爱莲, 强彦.
基于自编码器和流形正则的结构保持无监督特征选择
Structure Preserving Unsupervised Feature Selection Based on Autoencoder and Manifold Regularization
计算机科学, 2021, 48(8): 53-59. https://doi.org/10.11896/jsjkx.200700211
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!