计算机科学 ›› 2020, Vol. 47 ›› Issue (12): 56-64.doi: 10.11896/jsjkx.201200031

所属专题: 复杂系统的软件工程和需求工程

• 复杂系统的软件工程和需求工程* • 上一篇    下一篇

面向中文APP用户评论数据的软件需求挖掘方法

王莹, 郑丽伟, 张禹尧, 张晓妘   

  1. 北京信息科技大学计算机学院 北京 100101
  • 收稿日期:2020-09-03 修回日期:2020-10-31 出版日期:2020-12-15 发布日期:2020-12-17
  • 通讯作者: 郑丽伟(zlw@bistu.edu.cn)
  • 作者简介:wy_2407556211@163.com
  • 基金资助:
    国家自然科学基金项目(61402043)

Software Requirement Mining Method for Chinese APP User Review Data

WANG Ying, ZHENG Li-wei, ZHANG Yu-yao, ZHANG Xiao-yun   

  1. School of Computer Science Beijing Information Science and Technology University Beijing 100101,China
  • Received:2020-09-03 Revised:2020-10-31 Online:2020-12-15 Published:2020-12-17
  • About author:WANG Ying,born in 1996postgra-duate.Her main research interests include requirement engineering and social networks.
    ZHENG Li-wei,born in 1979Ph.Dassociate professor.His main research interests include requirement engineeringsocial networks and data quality enhancement.
  • Supported by:
    National Natural Science Foundation of China(61402043).

摘要: 从APP用户反馈数据中挖掘用户需求是APP迭代更新和需求获取的一种重要方式用户在APP应用市场中发表对APP不同维度的评价其中蕴含着用户对APP软件的改善需求.但是目前用户反馈数据存在数量大、质量良莠不齐的状况如何从海量的用户评论数据中省时省力地挖掘出有价值的需求具有重要的研究与现实意义.文中着眼于APP开发问题选取360手机助手中的APP用户评论数据旨在挖掘蕴含于用户评论数据中的软件需求.首先从功能性需求与非功能性需求两个维度出发将APP用户评论数据中蕴含的软件需求划分为功能待添加、功能待改进、性能、可用性、可靠性5个需求类别;其次对用户评论进行数据采集、标注构建APP评论需求挖掘数据集;最后利用构建好的数据集进行模型训练与交叉验证探究主流深度学习方法相较于统计机器学习模型在该任务上的表现.实验表明采用的深度学习模型TextCNNText RNN和Transformer相比传统的统计机器学习模型在此任务上更具优势.

关键词: APP用户评论, 机器学习, 软件需求挖掘, 中文数据集

Abstract: Mining requirements from APP user review data is an important way to obtain requirementsbecause users publish reviews of different dimensions of APP in the APP application marketwhich contain many requirements for APP.The APP user review data on the 360 mobile assistant is chosen in our experimentsaiming to discover the software requirements contained in these review data.Firstlythe software requirements contained in APP user review data are divided into five categorieswhich include functions to be addedfunctions to be improvedperformanceavailabilityand reliability.Secondlydata collectionlabeling of user comments and constructing app review requirements mining data set are carried on.Finallythe constructed data set is used for model training and testing to explore the performance of deep learning methods compared with statistical machine lear-ning models on this task.The experiment results show that the deep learning modelsTextCNNTextRNNand Transformer used in this paperhave more advantages in this task than traditional statistical machine learning models.

Key words: APP user reviews, Chinese data set, Machine learning, Software requirements mining

中图分类号: 

  • TP311
[1] SARRO F,HARMNA M,JIA Y,et al.Customer rating reactions can be predicted purely using app features[C]//Proc of the 26th Requirements Engineering Conference.IEEE,2018.
[2] SHI L,CHEN C,WANG Q,et al.Understanding feature requests by leveraging fuzzy method and linguistic[C]//Proc of the 32th IEEE/ACM International Conference on Automated SoftwareEngineering (ASE).2017:440-450.
[3] PALOMBA F,SALZA P,CIURUMELEA A,et al.Recommending and localizing change requests for mobile apps based on user reviews[C]//Proc of the 39th International Conference on Software Engineering.USA:IEEE,2017:106-117.
[4] SORBO A DI,PANICHELLA S,ALEXANDRU C V,et al.What would users change in my app? summarizing app reviews for recommending software changes[C]//Proc of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.USA,ACM,2016:499-510.
[5] JIANG W,ZHANG L,DAI Y,et al.Analyzing Helpfulness of Online Reviews for User Requirements Elicitation[J].Chinese Journal of Computers,2013,36(1):119-131.
[6] SCHNEIDER K.Focusing spontaneous feedback to support system evolution[C]//Proc of the 11th Requirements Engineering Conference.IEEE,2011:165-174.
[7] IACOB C,HARRISON R.Retrieving and analyzing mobile apps feature requests from online reviews[C]//Proc of the 10th Working Conference on Mining Software Repositories (MSR).San Francisco,2013:41-44.
[8] CHEN N,LIN J,HOI S C H,et al.AR-miner:mining informative reviews for developers from mobile app marketplace[C]//International Conference on Software Engineering.ACM,2014.
[9] KHAN J A,XIE Y,LIU L,et al.Analysis of Requirements-Related Arguments in User Forums[C]//Proc of the 27th IEEE International Requirements Engineering Conference (RE).Jeju Island,Korea (South),2019:63-74.
[10] KHAN J A,LIU L,JIA Y,et al.Linguistic Analysis of Crowd Requirements:An experimental study[C]//Proc of the RE Workshop.Empri,2018.
[11] MAALEJ W,NAYEBI M,JOHANN T,et al.Toward data-dri-ven requirements engineering[J].IEEE Software,2016,33(1):48-54.
[12] HOUMB S H,ISLAM S,KNAUSS E,et al.Eliciting securityrequirements and tracing them to design:an integration of Common Criteria,heuristics,and UMLsec[J].Requirements Engineering,2010,15(1):63-93.
[13] MAALEJ W,NABIL H.Bug report,feature request,or simply praise? On automatically classifying app reviews[C]//Proc of the 23rd IEEE International Requirements Engineering Confe-rence (RE).Ottawa,ON,2015:116-125.
[14] PANICHELLA S,SOEBO A D,GUZMAN E,et al.How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution[C]//International Conference on Software Maintenance &Evolution.IEEE,2015.
[15] VILLARROEL L,BAVOTA G,RUSSO B,et al.Release Planning of Mobile Apps Based on User Reviews[C]//Proc of the 38th IEEE/ACM International Conference on Software Engineering (ICSE).Austin,TX,2016:14-24.
[16] PANICHELLA S,SORBO DI A,GUZMAN E,et al.ARdoc:app reviews development oriented classifier[C]//Acm Sigsoft International Symposium on Foundations of Software Engineering.ACM,2016:1023-1027.
[17] SUPRAYOGI E,BUDI I,MAHENDRA R.Information Extraction for Mobile Application User Review[C]//Proc of International Conference on Advanced Computer Science and Information Systems (ICACSIS).Yogyakarta,2018:343-348.
[18] BUCHAN J,BANO M,ZOWGHI D,et al.Semi-Automated Extraction of New Requirements from Online Reviews for Software Product Evolution[C]//Proc of the 25th Australasian Software Engineering Conference (ASWEC).Adelaide,SA,2018:31-40.
[19] CHEN Q,ZHANG L,JIANG J,et al.Review Analysis Method Based on Support Vector Machine and Latent Dirichlet Allocation[J].Journal of Software,2019,30(5):349-362.
[20] HU T Y,JIANG Y.Mining of User's Comments Reflecting Usa-ge Feedback for APP Software[J].Journal of Software,2019(10):3168-3185.
[21] ZHANG H F.Introduction to Software Engineering[M].Beijing:Tsinghua University Press.
[22] CLELAND-HUANG J,SETTIMI R,ZOU X,et al.The Detection and Classification of Non-Functional Requirements with Application to Early Aspects[C]//Proc of the14th IEEE International Requirements Engineering Conference (RE'06).Minneapolis/St:Paul,MN,2006:39-48.
[23] GLINZ M.On Non-Functional Requirements[C]//Proc of IEEE International Requirements Engineering Conference.IEEE,2005.
[24] JIA Y D,LIU L.Recognition and Classification of Non-Func-tional Requirements in Chinese[J].Journal of Software,2019,30(10):3115-3126.
[25] DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proc of NAACL-HLT (1).2019.
[26] KIM Y.Convolutional Neural Networksfor Sentence Classification[C]//Proc of Conferenceon Empirical Methods in Natural Language Processing (EMNLP).2014
[27] LAI S,XU L,LIU K,et al.Recurrent convolutional neural networks for text classification[C]//Proc of the 29th AAAI conference on artificial intelligence.2015.
[28] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural computation,1997,9(8):1735-1780.
[29] CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP).2014:1724-1734.
[30] VASWANI A,SHAZEER N,PARMAR N,et al.Attention isAll you Need[C]//Proc of Neural Information Processing Systems.2017:5998-6008.
[31] LEWIS D.Naive (Bayes) at Forty:The independence assumption in information retrieval[C]//Proc of European Conference on Machine Learning.Springer,Berlin,Heidelberg,1998.
[32] QUINLAN J.C4:5:programs for machine learning[M].Elsevier,2014.
[33] CORTES C,VAPNIK V.Support-vector networks[J].Machine Learning,1995,20(3):273-297.
[34] BREIMAN L.Random forests[J].Machine Learning,2001,45(1):5-32.
[35] Paszke A,GROSS S,MASSA F,et al.Pytorch:An imperative style,high-performance deep learning library[C]//Proc of Advances in Neural Information Processing Systems.2019:8026-8037.
[36] KINGMA D P,BA J.Adam:A method for stochastic optimization[C]//Proc of the 3rd International Conference on Learning Representations.2015.
[37] ABUALHAIJA S,ARORA C,SABETZADEH M,et al.A Machine Learning-Based Approach for Demarcating Requirements in Textual Specifications[C]//Proc of 27th International Requirements Engineering Conference (RE).IEEE,2019.
[38] MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//Proc of the 3rd International Conference on Learning Representations.2013.
[1] 冷典典, 杜鹏, 陈建廷, 向阳.
面向自动化集装箱码头的AGV行驶时间估计
Automated Container Terminal Oriented Travel Time Estimation of AGV
计算机科学, 2022, 49(9): 208-214. https://doi.org/10.11896/jsjkx.210700028
[2] 宁晗阳, 马苗, 杨波, 刘士昌.
密码学智能化研究进展与分析
Research Progress and Analysis on Intelligent Cryptology
计算机科学, 2022, 49(9): 288-296. https://doi.org/10.11896/jsjkx.220300053
[3] 李瑶, 李涛, 李埼钒, 梁家瑞, Ibegbu Nnamdi JULIAN, 陈俊杰, 郭浩.
基于多尺度的稀疏脑功能超网络构建及多特征融合分类研究
Construction and Multi-feature Fusion Classification Research Based on Multi-scale Sparse Brain Functional Hyper-network
计算机科学, 2022, 49(8): 257-266. https://doi.org/10.11896/jsjkx.210600094
[4] 张光华, 高天娇, 陈振国, 于乃文.
基于N-Gram静态分析技术的恶意软件分类研究
Study on Malware Classification Based on N-Gram Static Analysis Technology
计算机科学, 2022, 49(8): 336-343. https://doi.org/10.11896/jsjkx.210900203
[5] 何强, 尹震宇, 黄敏, 王兴伟, 王源田, 崔硕, 赵勇.
基于大数据的进化网络影响力分析研究综述
Survey of Influence Analysis of Evolutionary Network Based on Big Data
计算机科学, 2022, 49(8): 1-11. https://doi.org/10.11896/jsjkx.210700240
[6] 陈明鑫, 张钧波, 李天瑞.
联邦学习攻防研究综述
Survey on Attacks and Defenses in Federated Learning
计算机科学, 2022, 49(7): 310-323. https://doi.org/10.11896/jsjkx.211000079
[7] 李亚茹, 张宇来, 王佳晨.
面向超参数估计的贝叶斯优化方法综述
Survey on Bayesian Optimization Methods for Hyper-parameter Tuning
计算机科学, 2022, 49(6A): 86-92. https://doi.org/10.11896/jsjkx.210300208
[8] 赵璐, 袁立明, 郝琨.
多示例学习算法综述
Review of Multi-instance Learning Algorithms
计算机科学, 2022, 49(6A): 93-99. https://doi.org/10.11896/jsjkx.210500047
[9] 王飞, 黄涛, 杨晔.
基于Stacking多模型融合的IGBT器件寿命的机器学习预测算法研究
Study on Machine Learning Algorithms for Life Prediction of IGBT Devices Based on Stacking Multi-model Fusion
计算机科学, 2022, 49(6A): 784-789. https://doi.org/10.11896/jsjkx.210400030
[10] 肖治鸿, 韩晔彤, 邹永攀.
基于多源数据和逻辑推理的行为识别技术研究
Study on Activity Recognition Based on Multi-source Data and Logical Reasoning
计算机科学, 2022, 49(6A): 397-406. https://doi.org/10.11896/jsjkx.210300270
[11] 姚烨, 朱怡安, 钱亮, 贾耀, 张黎翔, 刘瑞亮.
一种基于异质模型融合的 Android 终端恶意软件检测方法
Android Malware Detection Method Based on Heterogeneous Model Fusion
计算机科学, 2022, 49(6A): 508-515. https://doi.org/10.11896/jsjkx.210700103
[12] 许杰, 祝玉坤, 邢春晓.
机器学习在金融资产定价中的应用研究综述
Application of Machine Learning in Financial Asset Pricing:A Review
计算机科学, 2022, 49(6): 276-286. https://doi.org/10.11896/jsjkx.210900127
[13] 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明.
大数据驱动的社会经济地位分析研究综述
Big Data-driven Based Socioeconomic Status Analysis:A Survey
计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014
[14] 李野, 陈松灿.
基于物理信息的神经网络:最新进展与展望
Physics-informed Neural Networks:Recent Advances and Prospects
计算机科学, 2022, 49(4): 254-262. https://doi.org/10.11896/jsjkx.210500158
[15] 章晓庆, 方建生, 肖尊杰, 陈浜, RisaHIGASHITA, 陈婉, 袁进, 刘江.
基于眼前节相干光断层扫描成像的核性白内障分类算法
Classification Algorithm of Nuclear Cataract Based on Anterior Segment Coherence Tomography Image
计算机科学, 2022, 49(3): 204-210. https://doi.org/10.11896/jsjkx.201100085
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!