Computer Science ›› 2021, Vol. 48 ›› Issue (9): 118-124.doi: 10.11896/jsjkx.210400280

Special Issue: Intelligent Data Governance Technologies and Systems

• Intelligent Data Governance Technologies and Systems • Previous Articles     Next Articles

Public Opinion Sentiment Big Data Analysis Ensemble Method Based on Spark

DAI Hong-liang1, ZHONG Guo-jin1, YOU Zhi-ming1 , DAI Hong-ming2   

  1. 1 School of Economics and Statistics,Guangzhou University,Guangzhou 510006,China
    2 School of Software,South China University of Technology,Guangzhou 510006,China
  • Received:2021-04-26 Revised:2021-06-16 Online:2021-09-15 Published:2021-09-10
  • About author:DAI Hong-liang,born in 1978,Ph.D,professor,postdoctoral supervisor,is a member of China Computer Federation.His main research interests include machine learning and big data analysis.
    DAI Hong-ming,born in 1978,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include machine lear-ning and big data analysis,and software engineering.
  • Supported by:
    National Social Science Foundation(18BTJ029)

Abstract: With the development of mobile Internet technology,social media has become the main approach for the public to share views and express their emotions.Sentiment analysis for social media texts in major social events can effectively monitor public opinion.In order to solve the problem of low accuracy and efficiency of existing Chinese social media sentiment analysis algorithms,an ensemble sentiment analysis big data method(S-FWS) based on Spark distributed system is proposed.Firstly,the new words are found by calculating the PMI association degree after pre-segmentation by Jieba library.Then,the text features are extracted by considering the importance of words and feature selection is realized by Lasso.Finally,in order to improve the traditional Stacking framework neglecting the feature importance,the accuracy information of the primary learners is used to weight the probabilistic features,and the polynomial features are constructed to train the secondary learner.A variety of algorithms are introduced in the stand-alone mode and the Spark platform receptively to carry out comparative experiments.Results show that the S-FWS method proposed in this paper has certain advantages in accuracy and time consumption;distributed system can greatly improve the operating efficiency of the algorithms,and with the increase of working nodes,the time consumption of the algorithms gradually decreases.

Key words: Chinese social media, Public opinion, Sentiment analysis, Spark, Stacking

CLC Number: 

  • TP391
[1]PANG B,LEE L.Opinion mining and sentiment analysis[J].Foundations & Trends in Information Retrieval,2008,2(1/2):1-135.
[2]PANG B,LEE L,VAITHYANATHAN S.Thumbs up? Sentiment classification using machine learning techniques[C]//Proceedings of 2002 Empirical Methods in Natural Language Processing.2002:79-86.
[3]WANG T,LI M.Research on Comment Text Mining Based on LDA Model and Semantic Network[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2019,36(4):9-16.
[4]YUN W,GAO Q.An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis[C]//IEEE International Conference on Data Mining Workshop.IEEE,2015:1318-1325.
[5]ALREHILI A,ALBALAWI K.Sentiment analysis of customer reviews using ensemble method[C]//Proc of International Conference on Computer and Information Sciences.Piscataway,NJ:IEEE press,2019:1-6.
[6]ZHANG Y,ZHOU Y,LU H,et al.Traffic Network Flow Prediction Using Parallel Training for Deep Convolutional Neural Networks on Spark Cloud[J].IEEE Transactions on Industrial Informatics,2020(99):1-1.
[7]ELZAYADY H,BADRAN K M,SALAMA G I.SentimentAnalysis on Twitter Data using Apache Spark Framework[C]//2018 13th International Conference on Computer Engineering and Systems (ICCES).2018:171-176.
[8]YANG L Y,WANG Y Z.Application of Spark in SentimentAnalysis of Ensemble Learning Text[J].Computer Applications and Software,2020,37(6):130-134.
[9]WANG W H,JIN L J.Sentiment Analysis Ensemble Algorithm Based on Spark[J].Journal of Zhejiang University of Technology,2020,48(4):405-410,434.
[10]GRBIC D,HAFFERTY F W,HAFFERTY P K.Medical School Mission Statements as Reflections of Institutional Identity and Educational Purpose:A Network Text Analysis[J].Academic Medicine:Journal of the Association of American Medical Colleges,2013,88(6):852-860.
[11]ZHU A Q,LI S,TANG X D.Parallel FP_growth Association Rules Mining Method on Spark Platform[J].Computer Science,2020,47(12):139-143.
[12]ZAHARIA M,CHOWDHURY M,DAS T,et al.Resilient distributed datasets:A fault-tolerant abstraction for in-memory cluster computing[C]//Usenix Conference on Networked Systems Design & Implementation.2012:15-28.
[13]PECINA P,SCHLESINGER P.Combining association measures for collocation extraction [C]//Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions.Stroudsburg:ACL,2006:651-658.
[14]HE J F,ZHAO H,HE X M.Suspicious Person Text Representation Method Based on Improved TF-IDF[J].Computer Engineering and Design,2021,42(2):396-401.
[15]ZHANG D W,XU H,SU Z C,et al.Chinese comments sentiment classification based on word2vec and SVMperf[J].Expert Systems with Applications,2015,42(4):1857-1863.
[16]BREIMAN L.Random Forests[J].Machine Learning,2001,45(1):5-32.
[17]FRIEDMAN J H.Greedy Function Approximation:A Gradient Boosting Machine[J].Annals of Statistics,2001,29(5):1189-1232.
[18]CHEN T,GUESTRIN C.XGBoost:a scalable tree boosting system[J].International Conference on Knowledge Discovery and Data Mining,2016,1(1):785-794.
[19]DŽEROSKI S,ŽENKO B.Is combining classifiers with stacking better than selecting the best one?[J].Machine Learning,2004,54(3):255-273.
[20]CHU T Z,CHENG L,WONG H S.Corpus-based topic diffusion for short text clustering[J].Neurocomputing,2018,275:2444-2458.
[21]ASHRAF M,ZAMAN M,AHMED M.Using Ensemble Stac-kingC Method and Base Classifiers to Ameliorate Prediction Accuracy of Pedagogical Data[J].Procedia Computer Science,2018,132:1021-1040.
[22]MENAHEM E,ROKACH L,ELOVICI Y.Troika-An im-proved stacking schema for classification tasks[J].Information Sciences,2009,179(24):4097-4122.
[23]DAI H,WU W K,LI J C,et al.Incorporating Feature Selection in the Improved Stacking Algorithm for Online Learning Analysis and Prediction[J].Engineering Letters,2020,28(4):1011.
[24]JIANG M,LIU J,ZHANG L,et al.An improved stacking framework for stock index prediction by leveraging tree-based ensemble models and deep learning algorithms[J].Physica A:Statistical Mechanics and its Applications,2020,541:122272.
[1] WANG Jian, PENG Yu-qi, ZHAO Yu-fei, YANG Jian. Survey of Social Network Public Opinion Information Extraction Based on Deep Learning [J]. Computer Science, 2022, 49(8): 279-293.
[2] WANG Fei, HUANG Tao, YANG Ye. Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion [J]. Computer Science, 2022, 49(6A): 784-789.
[3] DING Feng, SUN Xiao. Negative-emotion Opinion Target Extraction Based on Attention and BiLSTM-CRF [J]. Computer Science, 2022, 49(2): 223-230.
[4] YUAN Jing-ling, DING Yuan-yuan, SHENG De-ming, LI Lin. Image-Text Sentiment Analysis Model Based on Visual Aspect Attention [J]. Computer Science, 2022, 49(1): 219-224.
[5] HU Yan-li, TONG Tan-qian, ZHANG Xiao-yu, PENG Juan. Self-attention-based BGRU and CNN for Sentiment Analysis [J]. Computer Science, 2022, 49(1): 252-258.
[6] ZHANG Jin, DUAN Li-guo, LI Ai-ping, HAO Xiao-yan. Fine-grained Sentiment Analysis Based on Combination of Attention and Gated Mechanism [J]. Computer Science, 2021, 48(8): 226-233.
[7] SHI Wei, FU Yue. Microblog Short Text Mining Considering Context:A Method of Sentiment Analysis [J]. Computer Science, 2021, 48(6A): 158-164.
[8] CHENG Tie-jun, WANG Man. Network Public Opinion Trend Prediction of Emergencies Based on Variable Weight Combination [J]. Computer Science, 2021, 48(6A): 190-195.
[9] PAN Fang, ZHANG Hui-bing, DONG Jun-chao, SHOU Zhao-yu. Aspect Sentiment Analysis of Chinese Online Course Review Based on Efficient Transformer [J]. Computer Science, 2021, 48(6A): 264-269.
[10] YU Jian-ye, QI Yong, WANG Bao-zhuo. Distributed Combination Deep Learning Intrusion Detection Method for Internet of Vehicles Based on Spark [J]. Computer Science, 2021, 48(6A): 518-523.
[11] LI Jian-lan, PAN Yue, LI Xiao-cong, LIU Zi-wei, WANG Tian-yu. Chinese Commentary Text Research Status and Trend Analysis Based on CiteSpace [J]. Computer Science, 2021, 48(11A): 17-21.
[12] WANG Mao-guang, YANG Hang. Risk Control Model and Algorithm Based on AP-Entropy Selection Ensemble [J]. Computer Science, 2021, 48(11A): 71-76.
[13] XU Lin-hong, LIU Xin, YUAN Wei, QI Rui-hua. Construction and Application of Russian Multimodal Emotion Corpus [J]. Computer Science, 2021, 48(11): 312-318.
[14] ZHANG Yan, LI Tian-rui. Review of Comment-oriented Aspect-based Sentiment Analysis [J]. Computer Science, 2020, 47(6): 194-200.
[15] YANG Zong-lin, LI Tian-rui, LIU Sheng-jiu, YIN Cheng-feng, JIA Zhen, ZHU Jie. Streaming Parallel Text Proofreading Based on Spark Streaming [J]. Computer Science, 2020, 47(4): 36-41.
Full text



No Suggested Reading articles found!