计算机科学 ›› 2020, Vol. 47 ›› Issue (11A): 78-82.doi: 10.11896/jsjkx.200400061

• 人工智能 • 上一篇    下一篇

结合扩充词典与自监督学习的网络评论情感分类

景丽, 李曼曼, 何婷婷   

  1. 河南财经政法大学计算机与信息工程学院 郑州 450046
  • 出版日期:2020-11-15 发布日期:2020-11-17
  • 通讯作者: 李曼曼(lmm18037193250@163.com)
  • 作者简介:1286041400@qq.com
  • 基金资助:
    国家自然科学基金(61806073,31700858,61802110)

Sentiment Classification of Network Reviews Combining Extended Dictionary and Self-supervised Learning

JING Li, LI Man-man, HE Ting-ting   

  1. School of Computer and Information Engineering,Henan University of Economics and Law,Zhengzhou 450046,China
  • Online:2020-11-15 Published:2020-11-17
  • About author:JING Li,born in 1971,Ph.D,professor,is a member of China Computer Federation.Her main research interests include artificial intelligence and information security.
    LI Man-man,born in 1992,postgradua-te.Her main research interests include data analysis,data mining and natural language processing.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China(61806073,31700858,61802110).

摘要: 在高速发展的互联网时代,网络评论情感分析对分析舆情、监控电商有着重要作用。现有分类方法主要有情感词典方法和机器学习方法。情感词典方法过于依赖词典中的情感词,情感词典越完备,网络评论情感倾向越显著,分类效果越好,但对于情感倾向不易区分的评论,其分类效果欠佳。机器学习方法是一种有监督的方法,其分类效果依赖于大量事先标注的语料,目前语料标注是通过人工完成,工作量极大。文中综合了情感词典和机器学习两种方法的特点,构建了一个网络评论情感分类模型,利用相关领域网络评论对情感词典进行扩充,基于情感词典方法的分类结果,通过自监督学习训练一个分类器,进而提高情感倾向模糊文本的分类正确率。实验表明,与情感词典方法和机器学习方法相比,所提模型在酒店评论、京东评论两个数据集上都获得了更好的情感分类效果。

关键词: 词向量, 机器学习, 情感词典, 情感分类, 网络评论

Abstract: In the rapidly developing Internet era,sentiment analysis of online reviews plays an important role in analyzing public opinion and monitoring e-commerce.Existing classification methods mainly include sentiment dictionary methods and machine learning methods.The sentiment dictionary method relies too much on the sentiment words in the dictionary.The more complete the sentiment dictionary,the more pronounced the sentiment tendency of online comments and the better classification effect.The classification effect of comments is not good when the sentiment tendencies are not easy to distinguish.The machine learning method is a supervised method,and its classification effect relies on a large number of pre-annotated corpora.Currently,the corpus annotation is done manually,and the workload is extremely large.This paper combines characteristics of the two methods to build a new sentiment classification model of network reviews.First,the sentiment dictionary is expanded based on the domain of online reviews,andthe sentiment value of each online comment is calculated according to the extended sentiment dictionary.According to the preset sentiment threshold,the comments with significant is sentiment tendencies and higher accuracy are selected as the definite set,and the rest that are not easily distinguished are used as uncertain sets.The classification result of the definite set is directly determined by the sentiment value.Second,according to the definite set from the sentiment dictionary method,a classifier is trained through self-supervised learning,and the training data do not require manual annotation.Finally,the trained classifier is used to classify the uncertain set again,and an improved algorithm is used to improve the classification result of the uncertain set.Experiments show that,compared with the sentiment dictionary method and the machine learning method,the proposed model achieves a better sentiment classification effect for the sentiment classification of hotel reviews and Jingdong reviews.

Key words: Internet reviews, Machine learning, Sentiment classification, Sentiment dictionary, Word vectors

中图分类号: 

  • TP391.1
[1] HONG W,LI M.A Summary of Research on Text Sentiment Analysis Methods[J].Computer Engineering & Science,2019,41(4):750-757.
[2] QIU L,ZHANG W,HU C,et al.Selc:a self-supervised model for sentiment classification[C]//Proceedings of the 18th ACM Conference on Information and Knowledge Management.2009:929-936.
[3] HATZIVASSILOGLOU V,MCKEOWNC K R.Predicting thesemantic orientation of adjectives[C]//Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics.Association for Computational Linguistics,1997:174-181.
[4] WIEBE J.Learning subjective adjectives from corpora[C]//Proceedings of the 17th National Conference on Artificial Intelligence.Menlo Park,CA:AAAI Press,2000:735-741.
[5] TURNEY P D,LITTMAN M L.Measuring praise and criti-cism:Inference of semantic orientation from association[J].ACM Transactions on Information Systems (TOIS),2003,21(4):315-346.
[6] LI S S,LI Y W,HUANG J R,et al.Construction method of Chinese sentiment dictionary based on bilingual information and label propagation algorithm[J].Journal of Chinese Information Processing,2013,27(6):75-81.
[7] WANG Z T,YU Z W,GUO B,et al.Sentiment Analysis of Chinese Weibo Based on Dictionary and Rule Set[J].Computer Engineering and Applications,2015,51(8):218-225.
[8] FAN Z,GUO Y,ZHANG Z H.et al.Sentiment analysis of movie reviews based on dictionaries and weakly annotated information[J].Journal of Computer Applications,2018,38(11):3084-3088.
[9] RADOVANO M,IVANOVI M.Interactions between document representation and feature selection in text categorization[C]//International Conference on Database and Expert Systems Applications.Springer,Berlin,Heidelberg,2006:489-498.
[10] JHA V,SAVITHA R,SHENOY P D,et al.A novel sentiment aware dictionary for multi-domain sentiment classification[J].Computers & Electrical Engineering,2018,69:585-597.
[11] PANG B,LEE L,VAITHYANATHAN S.Sentiment classification using machine learning techniques[C]//Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing(EMNLP).2002:79-86.
[12] PALTOGLOU G,THEWALL M.A study of information re-trieval weighting schemes for sentiment analysis[C]//Procee-dings of the 48th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2010:1386-1395.
[13] TRIPATHY A,AGRAWAL A,RATH S K.Classification ofsentiment reviews using n-gram machine learning approach[J].Expert Systems with Applications,2016,57:117-126.
[14] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Advances in neural information processing systems.2013:3111-3119.
[15] WEI G S,WU K C.Sentiment analysis based on word vectormodel[J].Computer Systems & Applications,2017(3):184-188.
[16] WANG M Y,WU H,JIA X T.Research on Multi-EmotionClassification of Weibo Based on Word2vec and Extended Emotion Dictionary[J].Journal of Northeast Normal University(Natural Science Edition),2019,51(1):55-62.
[17] TANG X B,WANG H Y.Research on Weibo Product Reviews Mining Model[J].Journal of Intelligence,2013,32(2):107-111,127.
[18] TAN S B.Hotel review corpus [EB/OL].[2020-03-17].https://www.aitechclub.com/data-detail?data_id=29.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[4] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[5] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[6] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[7] 姜胜腾, 张亦弛, 罗鹏, 刘月玲, 曹阔, 赵海涛, 魏急波.
语义通信系统的性能度量指标分析
Analysis of Performance Metrics of Semantic Communication Systems
计算机科学, 2022, 49(7): 236-241. https://doi.org/10.11896/jsjkx.211200071
[8] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[9] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[10] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[11] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[12] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[13] 林夕, 陈孜卓, 王中卿.
基于不平衡数据与集成学习的属性级情感分类
Aspect-level Sentiment Classification Based on Imbalanced Data and Ensemble Learning
计算机科学, 2022, 49(6A): 144-149. https://doi.org/10.11896/jsjkx.210500205
[14] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[15] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!