计算机科学 ›› 2016, Vol. 43 ›› Issue (7): 234-239.doi: 10.11896/j.issn.1002-137X.2016.07.042

• 人工智能 • 上一篇    下一篇

基于逐步优化分类模型的跨领域文本情感分类

张军,王素格   

  1. 山西大学计算机与信息技术学院 太原030006,山西大学计算机与信息技术学院 太原030006;山西大学计算智能与中文信息处理教育部重点实验室 太原030006
  • 出版日期:2018-12-01 发布日期:2018-12-01
  • 基金资助:
    本文受国家自然科学基金资助

Cross-domain Sentiment Classification Based on Optimizing Classification Model Progressively

ZHANG Jun and WANG Su-ge   

  • Online:2018-12-01 Published:2018-12-01

摘要: 跨领域文本情感分类已成为自然语言处理领域的一个研究热点。针对传统主动学习不能利用领域间的相关信息以及词袋模型不能过滤与情感分类无关的词语,提出了一种基于逐步优化分类模型的跨领域文本情感分类方法。首先选择源领域和目标领域的公共情感词作为特征,在源领域上训练分类模型,再对目标领域进行初始类别标注,选择高置信度的文本作为分类模型的初始种子样本。为了加快目标领域的分类模型的优化速度,在每次迭代时,选取低置信度的文本供专家标注,将标注的结果与高置信度文本共同加入训练集,再根据情感词典、评价词搭配抽取规则以及辅助特征词从训练集中动态抽取特征集。实验结果表明,该方法不仅有效地改善了跨领域情感分类效果,而且在一定程度上降低了人工标注样本的代价。

关键词: 情感分类,跨领域,分类模型,特征抽取,置信度

Abstract: Cross-domain sentiment classification has attracted more attention in natural language processing field.Given that tradition active learning can’t make use of the public information between domains and the bag of words model can’t filter these words not related with sentiment classification,a method of cross-domain sentiment classification based on optimizing classification model progressively was proposed.Firstly,this paper selected the public sentiment words as features to train classification model on the labeled source domain,then used the classification model to predict the initial category label for target domain and selected the texts with high confidence value as initial seed texts of the learning model.Secondly,we added the high confidence text and low confidence text to the training set at each iteration.Finally,the feature set was extracted to transform feature space based on the sentimental dictionary,evaluation collocation rules and assist feature words.The experimental results indicate that this method can not only improve the accuracy of cross domain sentiment classification effectively,but also reduce the manual annotation price to some extent.

Key words: Sentiment classification,Cross domain,Classification model,Feature extraction,Confidence

[1] Wang Su-ge,Li De-yu,Wei Ying-jie.A Method of Text Senti-ment Classification Based on Weighted Rough Membership[J].Journal of Computer Research and Development,2011,48(5):855-861(in Chinese) 王素格,李德玉,魏英杰.基于赋权粗糙隶属度的文本情感分类方法[J].计算机研究与发展,2011,8(5):855-861
[2] Zhao Yan-yan,Qin Bing,Liu Ting.Sentiment analysis[J].Journal of Software,2010,21(8):1834-1848(in Chinese) 赵妍妍,秦兵,刘挺.文本情感分析[J].软件学报,2010,21(8):1834-1848
[3] Pang B,Lee L,Vaithyanathan S.Thumbs up?:Sentiment Classification using Machine Learning Techniques[C]∥Proceedings of the Association of Computational Linguistics Conf on Empirical Methods in Natural Language Processing.Stroudsburg,PA:ACL,2002:79-86
[4] Olsson F.A Literature Survey of Active Machine Learning in the Context of Natural Language Processing[R].Swedish Institute of Computer Science,2009
[5] Chen Xiao.Chinese Organization Names Recognition Based onSupport Vector Machine[D].Shanghai:Shanghai Jiao Tong University,2007(in Chinese) 陈霄.基于支持向量机的中文组织机构名识别[D].上海:上海交通大学,2007
[6] Che Wan-xiang,Zhang Mei-shan,Liu Ting.Active Learning for Chinese Dependency Parsing[J].Journal of Chinese Information Processing,2012,26(2):18-22(in Chinese) 车万翔,张梅山,刘挺.基于主动学习的中文依存句法分析[J].中文信息学报,2012,26(2):18-22
[7] Tong S,Koller D.Support Vector Machine Active Learning with Applications to Text Classification[J].The Journal of Machine Learning Research,2002,2(1):45-66
[8] Li S,Xue Y,Wang Z,et al.Active Learning for Cross-Domain Sentiment Classification[C]∥Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence.Menlo Park,CA:AAAI Press,2013:2127-2133
[9] Blitzer J,Dredze M,Pereira F.Biographies,Bollywood,Boom-boxes and Blenders:Domain Adaptation for Sentiment Classification[J]∥ACL,2012,1(2):187-205
[10] Liu K,Zhao J.Cross-Domain Sentiment Classification Using a Two-Stage Method[C]∥Proceedings of the 18th ACM Confe-rence on Information and Knowledge Management.New York:ACM,2009:1717-1720
[11] Zhang Hong-yu,Zhou Quan,Hu Xue-gang.Feature Selection for Cross-Domain Sentiment Classification[J].Pattem Recognition and Aitificial Intelligence,2013,26(11) :1068-1072(in Chinese)张玉红,周全,胡学钢.面向跨领域情感分类的特征选择方法[J].模式识别与人工智能,2013,26(11) :1068-1072
[12] Wei Xian-hui,Zhang Shao-wu,Yang Liang,et al.Cross-Domain Sentiment Analysis Based on Weighted SimRank[J].Pattem Recognition and Aitificial Intelligence,2013,26(11):1004-1009(in Chinese) 魏现辉,张绍武,杨亮,等.基于加权SimRank的跨领域文本情感倾向性分析[J].模式识别与人工智能,2013,26(11):1004-1009
[13] Tan S,Wu G,Tang H,et al.A Novel Scheme for Domain-transfer Problem in the context of Sentiment Analysis[C]∥Procee-dings of the 16th ACM Conference on Information and Know-ledge Management.New York:ACM,2007:979-982
[14] Jiang J,Zhai C X.Instance Weighting for Domain Adaptation in NLP[C]∥Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics.Stroudsburg,PA:ACL,2007:264-271
[15] Dai W,Yang Q,Xue G R,et al.Boosting for Transfer Learning[C]∥Proceedings of the 24th International Conference on Machine Learning.Corvallis,Oregon,USA,2007:193-200
[16] Zhao Chuan-jun,Wang Su-ge,Li De-yu,et al.Cross-DomainText Sentiment Classification Based on Grouping-AdaBoost Ensemble[J].Journal of Computer Research and Development,2015,52(3):629-638(in Chinese) 赵传君,王素格,李德玉,等.基于分组提升集成的跨领域文本情感分类[J].计算机研究与发展,2015,52(3):629-638
[17] Liao X,Xue Y,Carin L.Logistic Regression with an Auxiliary Data Source[C]∥Proceedings of the 22nd International Confe-rence on Machine Learning.New York:ACM,2005:505-512
[18] Xu Lin-hong,Lin Hong-fei,Pang Yu,et al.Constructing the Affective Lexicon Ontology[J].Journal of the China Society for Scientific and Technical Information,2008,27(2):180-185(in Chinese) 徐琳宏,林鸿飞,潘宇,等.情感词汇本体的构造[J].情报学报,2008,27(2):180-185
[19] Chen S,Wang Y.Mining the Emotional Words from ChineseReviews Based on Part of Speech and Syntax[C]∥2012 2nd International Conference on Consumer Electronics,Communications and Networks (CECNet).IEEE,2012:1904-1907

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!