计算机科学 ›› 2020, Vol. 47 ›› Issue (10): 69-74.doi: 10.11896/jsjkx.190700034

• 数据库&大数据&数据科学 • 上一篇    下一篇

融合信息增益和梯度下降算法的在线评论有用程度预测模型

冯进展, 蔡淑琴   

  1. 华中科技大学管理学院 武汉430074
  • 收稿日期:2019-07-03 修回日期:2019-09-27 出版日期:2020-10-15 发布日期:2020-10-16
  • 通讯作者: 蔡淑琴(caishuqin@hust.edu.cn)
  • 作者简介:fjinzhan@hust.edu.cn
  • 基金资助:
    国家自然科学基金(71371081);教育部博士点基金(20130142110044)

Helpfulness Degree Prediction Model of Online Reviews Fusing Information Gain and Gradient Decline Algorithms

FENG Jin-zhan, CAI Shu-qin   

  1. School of Management,Huazhong University of Science and Technology,Wuhan 430074,China
  • Received:2019-07-03 Revised:2019-09-27 Online:2020-10-15 Published:2020-10-16
  • About author:FENG Jin-zhan,born in 1981,postgra-duate.His main research interests include business intelligence,management information and network complaint handling.
    CAI Shu-qin,born in 1955,Ph.D,professor,Ph.D supervisor.Her main research interests include business intelligence and management information system.
  • Supported by:
    National Natural Science Foundation of China (71371081) and Specialized Research Fund for the Doctoral Program of Higher Education (20130142110044)

摘要: 由于无法预知产品在线评论的文本内容是否对浏览者有用,大量的无用评论增加了潜在消费者的信息搜索成本,甚至降低了潜在消费者购买产品的可能性。为提高电子商务平台的有用在线评论率,为撰写评论者提供测试功能,建立在线评论有用程度预测模型。根据在线评论的文本特征,所提模型选择在线评论的词语数量、词语的有用值、产品特征数量等3个特征,构建一个预测在线评论有用程度的模型,其中词语的有用值是词语区分在线评论有用程度的信息增益量,然后根据大量在线评论数据利用梯度下降算法解出模型参数。实验结果显示,随着词语数量、词语有用值、产品特征数量的增长,评论有用程度不断提高。实验中把在线评论分为一般、有用、非常有用3个程度,对于一般的在线评论,预测精确率为92.96%;对于“有用”在线评论,预测精确率为94.83%;对于“非常有用”在线评论,预测精确率为67.63%。实验对模型性能进行测试,得到平均精确率为85.05%,召回率为82.81%,F1值为83.72%,该结果验证了所提模型预测在线评论有用程度的可行性。

关键词: 梯度下降法, 信息增益, 有用程度, 在线评论

Abstract: Because it is impossible to predict whether the text content of online product reviews is helpful for viewers,many reviewers write a large number of unhelpful reviews,which increases the cost of information search for potential consumers,and even reduces the possibility of potential consumers buying products.In order to improve the helpful online reviews rate of e-commerce platform and provide test function for reviewers,a prediction model of online reviews helpfulness is established.According to the text characteristics of online reviews,the model chooses three features of online reviews:the number of words,the helpful value of words,and the number of product features,to construct a model for predicting the helpfulness of online reviews.The helpful value is the information gain of words to distinguish the helpfulness of online reviews.And then,according to a large number of online reviews,by using the gradient descent algorithm,the model parameters are solved.The experimental results show that with the increase of the number of words,helpful value of words and the number of product features,the helpfulness of reviews increases continuously.The online reviews are divided into three levels:general,helpful and very helpful.The general predicted accuracy of online reviews is 92.96%,helpful accuracy is 94.83%,and very helpful accuracy is 67.63%.The average accuracy,recall and F1 of the model are 85.05%,82.81% and 83.72%,respectively.The results verify the feasibility of the model to predict the helpfulness of online reviews.

Key words: Gradient descent algorithm, Helpfulness degree, Information gain, Online reviews

中图分类号: 

  • TP391
[1]MIN H J,PARK J C.Identifying helpful reviews based on customers mentions about experiences[J].Expert Systems with Applications,2012,39(15):11830-11838.
[2]SHAN Y.How credible are online product reviews? The effects of self-generated and system-generated cues on source credibility evaluation[J].Computers in Human Behavior,2016,55:633-641.
[3]PENG L,ZHOU Q H,QIU J T.Research on the Model ofHelpfulness Factors of OnlineCustomer Reviews[J].Computer Science,2011,38(8):205-207.
[4]PAN Y,ZHANG J Q.Born Unequal:A Study of the Helpfulness of User-Generated Product Reviews[J].Journal of Retailing,2011,87(4):598-612.
[5]FILIERI R.What makes an online consumer review trustwor-thy?[J].Annals of Tourism Research,2016,58:46-64.
[6]HOMER P M.Message Framing and the Interrelationshipsamong Ad-Based Feelings,Affect,and Cognition[J].Journal of Advertising,1992,21(1):19-33.
[7]WU T Y,LIN C A.Predicting the effects of eWOM and online brand messaging:Source trust,bandwagon effect and innovation adoption factors[J].Telematics & Informatics,2017,34(2):470-480.
[8]WANG H W,MENG Y.Helpful Features Identification of Online Reviews Quality on GBDT Feature Contribution[J].Journal of Chinese Information Processing,2017,31(3):109-117.
[9]LI C,XIANG J,XIANG J.Assessment method of credibility on online product reviews[J].Journal of Computer Applications,2019,39(1):187-191.
[10]HU X G,CHEN F X,ZHANG Y H.Research on impact factors of online reviews’helpfulness based on product reviews data[J].Application Research of Computers,2016,33(12):3559-3561.
[11]SINGH J P,IRANI S,RANA N P,et al.Predicting the “helpfulness” of online consumer reviews[J].Journal of Business Research,2017,70(1):346-355.
[12]LEE S,CHOEH J Y.Predicting the helpfulness of onlinereviews using multilayer perceptron neural networks[J].Expert Systems with Applications,2014,41(6):3041-3046.
[13]SINGH J P,IRANI S,RANA N P,et al.Predicting the “helpfulness” of online consumer reviews[J].Journal of Business Research,2017,70:346-355.
[14]PARK Y J.Predicting the Helpfulness of Online Customer Reviews across Different Product Types[J].Sustainability,2018,10(6):1735.
[15]KRISHNAMOORTHY S.Linguistic features for review helpfulness prediction[J].Expert Systems with Applications,2015,42(7):3751-3759.
[16]JIANG W,ZHANG L,DAI Y,et al.Analyzing Helpfulness of Online Reviews for User Requirements Elictation[J].Chinese Journal of Computers,2013,36(1):119-131.
[17]QIU J P.Information Metrology (5) Lecture 5:The Law of Frequency Distribution of DocumentInformation Words-Zipf's Law[J].Information Studies:Theory& Application,2000(5):77-81.
[18]ZHANG Y H,LI Z W,ZHAO J C.How the Information Quality Affects the Online Review Usefulness?-An Emprical Analysis Based on Taobao Reciew Data[J].Chinese Journal of Management,2017,14(1):77-85.
[19]WANG Z H,JIANG W.Online Reviews Sentiment AnalysisModel Based on Rough Sets[J].Computer Engineering,2012,38(16):1-4.
[20]YU M Z,NARISA Z.Feature extraction method based on mutual self-expanding mode[J].Application Research of Computers,2017,34(4):977-980.
[21]XU Q,ZHANG X,YU S H,et al.Multi-feature-based classification method using random forest and superpixels for polarimetric SAR images[J].Journal of Remote Sensing,2019,23(4):685-694.
[1] 胡艳梅, 杨波, 多滨.
基于网络结构的正则化逻辑回归
Logistic Regression with Regularization Based on Network Structure
计算机科学, 2021, 48(7): 281-291. https://doi.org/10.11896/jsjkx.201100106
[2] 赵志强, 易秀双, 李婕, 王兴伟.
基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究
Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm
计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001
[3] 杨力, 李欣宇, 石怀峰, 潘成胜.
空间信息网络任务智能识别方法
Task Intelligent Identification Method for Spatial Information Network
计算机科学, 2020, 47(4): 262-269. https://doi.org/10.11896/jsjkx.190300111
[4] 刘晓彤,王伟,李泽禹,沈思婉,姜小明.
基于改进BP神经网络的尿液中红白细胞识别算法
Recognition Algorithm of Red and White Cells in Urine Based on Improved BP Neural Network
计算机科学, 2020, 47(2): 102-105. https://doi.org/10.11896/jsjkx.191100195
[5] 杨烽.
利用粒计算的符号型数据分组算法
Symbolic Value Partition Algorithm Using Granular Computing
计算机科学, 2018, 45(11A): 445-452.
[6] 李虹利, 蒙祖强.
运用信息增益和不一致度进行填补的属性约简算法
Attribute Reduction Algorithm Using Information Gain and Inconsistency to Fill
计算机科学, 2018, 45(10): 217-224. https://doi.org/10.11896/j.issn.1002-137X.2018.10.040
[7] 姜芳,李国和,岳翔.
基于语义的文档特征提取研究方法
Semantic-based Feature Extraction Method for Document
计算机科学, 2016, 43(2): 254-258. https://doi.org/10.11896/j.issn.1002-137X.2016.02.053
[8] 李 玲,刘华文,徐晓丹,赵建民.
基于信息增益的多标签特征选择算法
Multi-label Feature Selection Algorithm Based on Information Gain
计算机科学, 2015, 42(7): 52-56. https://doi.org/10.11896/j.issn.1002-137X.2015.07.012
[9] 罗惠,郭斌,於志文,王柱,封云.
基于网络拓扑和地理特征融合的朋友关系预测模型
Friendship Prediction Based on Fusion of Network Topology and Geographical Features
计算机科学, 2014, 41(6): 43-47. https://doi.org/10.11896/j.issn.1002-137X.2014.06.009
[10] 翟军昌,秦玉平,车伟伟.
垃圾邮件过滤中信息增益的改进研究
Improvement of Information Gain in Spam Filtering
计算机科学, 2014, 41(6): 214-216. https://doi.org/10.11896/j.issn.1002-137X.2014.06.042
[11] 胡文军,王娟,王培良,王士同.
适合大样本的线性SVMs快速集成模型
Fast Model of Ensembling Linear Support Vector Machines Suitable for Large Datasets
计算机科学, 2014, 41(5): 245-249. https://doi.org/10.11896/j.issn.1002-137X.2014.05.052
[12] 邵杰,杜丽娟,杨静宇.
XCSG在多机器人强化学习中的应用
Applications of XCSG in Multi-robot Reinforcement Learning
计算机科学, 2013, 40(8): 249-251.
[13] 唐磊,李春平,杨柳.
统计策略序列模式挖掘及其在软件缺陷预测中的应用
Statistically Significant Sequential Pattern Mining Applying to Software Defect Prediction
计算机科学, 2013, 40(5): 164-167.
[14] 任永功,杨雪,杨荣杰,胡志冬.
基于信息增益特征关联树的文本特征选择算法
Text Feature Selection Methods Based on Information Gain and Feature Relation Tree
计算机科学, 2013, 40(10): 252-256.
[15] 于海涛,贾美娟,王慧强,邵国强.
基于人工鱼群的优化K-means聚类算法
K-means Clustering Algorithm Based on Artificial Fish Swarm
计算机科学, 2012, 39(12): 60-64.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!