计算机科学 ›› 2015, Vol. 42 ›› Issue (9): 159-164.doi: 10.11896/j.issn.1002-137X.2015.09.031

• 软件与数据库技术 • 上一篇    下一篇

面向软件仓库挖掘的数据驱动特征提取方法

李晓晨,江贺,任志磊   

  1. 大连理工大学软件学院 大连116621,大连理工大学软件学院 大连116621,大连理工大学软件学院 大连116621
  • 出版日期:2018-11-14 发布日期:2018-11-14
  • 基金资助:
    本文受教育部新世纪优秀人才支持计划(NCET-13-0073),国家自然科学基金(61175062,4)资助

Data Driven Feature Extraction for Mining Software Repositories

LI Xiao-chen, JIANG He and REN Zhi-lei   

  • Online:2018-11-14 Published:2018-11-14

摘要: 在软件仓库挖掘领域, 通常 将软件工程任务转换成数据挖掘问题进行解决。领域特征的使用严重影响了软件任务的解决效果。然而,如何根据特定任务从软件仓库数据中提取有价值的特征,在软件仓库挖掘领域尚缺乏系统的研究。数据驱动特征提取方法是一种新的特征提取方法。对于给定的软件工程任务,该方法从任务的数据集中选取部分数据(如源代码、缺陷报告等),招募若干志愿者人工完成该任务,并要求志愿者说明在人工完成特定软件工程任务时所考虑的因素。通过分析这些因素,可以提取所需的领域特征。以缺陷报告摘要任务为例进行实验,结果表明新方法能够发现高效的领域特征,并取得比现有方法更好的预测效果。

关键词: 软件仓库挖掘,数据驱动方法,特征提取,缺陷报告摘要

Abstract: In mining software repositories(MSR),software tasks are usually transformed into data mining problems for solving.Domain-specific features heavily impact the solving of software tasks.However,no systematic investigation has been conducted on the issue of extracting features for specific software tasks.In this study,data driven feature extraction(DDFE) is a new feature extraction approach.For a software task,DDFE extracts a set of software data(e.g.,source code,bug reports) and employs some volunteers to manually accomplish this software task.During the process,these volunteers are requested to submit their reasons under consideration.From these submitted reasons,DDFE can extract domain-specific features for software tasks.The experimental results on the task of bug report summarization demonstrate that DDFE may find effective features and achieve better predictive results against the state-of-the-art algorithm in the literatures.

Key words: Mining software repositories,Data driven approach,Feature extraction,Bug report summarization

[1] Xie T,Pei J,Hassan A E.Mining software engineering data[C]∥Proceedings of the 29th International Conference on Software Engineering(ICSE’2007).2007:172-173
[2] Hassan A E,Xie T.Software intelligence:the future of miningsoftware engineering data[C]∥Proceedings of the FSE/SDP workshop on Future of Software Engineering Research(FoSER’2010).2010:161-166
[3] Xie T,Thummalapenta S,Lo D,et al.Data mining for software engineering [J].Computer,2009,42(8):55-62
[4] Srinivasa K G,Venugopal K R,Patnaik L M.Feature extraction using fuzzy c-means clustering for data mining systems[J].International Journal of Computer Science and Network Security,2006,6(3A):230-236
[5] Sun C,Lo D,Khoo S C,et al.Towards more accurate retrieval of duplicate bug reports[C]∥Proceedings of 2011 26th IEEE/ACM International Conference on Automated Software Engineering(ASE’11).2011:253-262
[6] Anvik J,Hiew L,Murphy G C.Who should fix this bug? [C]∥Proceedings of the 28th International Conference on Software Engineering(ICSE’06).2006:361-370
[7] Jeong G,Kim S,Zimmermann T.Improving bug triage with bug tossing graphs[C]∥Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering(FSE’09).2009:111-120
[8] Xuan J,Jiang H,Ren Z,et al.Developer prioritization in bug repositories[C]∥Proceedings of the 34th International Confe-rence on Software Engineering(ICSE’12).2012:25-35
[9] Mani S,Catherine R,Sinha V S,et al.Ausum:approach for unsupervised bug report summarization[C]∥Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering(FSE’12).2012:11-21
[10] Lotufo R,Malik Z,Czarnecki K.Modelling the ‘hurried’ bug report reading process to summarize bug reports[C]∥Proceedings of the 28th IEEE International Conference on Software Maintenance(ICSM’12).2012:430-439
[11] Runeson P,Alexandersson M,Nyholm O.Detection of duplicate defect reports using natural language processing[C]∥Procee-dings of the 29th International Conference on Software Enginee-ring(ICSE’07).2007:499-510
[12] Rastkar S,Murphy G C,Murray G.Summarizing software artifacts:a case study of bug reports[C]∥ Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering(ICSE’10).2010,1:505-514
[13] Yin S,Ding S,Xie X,et al.A review on basic data-driven approaches for industrial process monitoring [J].IEEE Transactions on Industrial Electronics,2014,61(11):6418-6428
[14] Yin S,Wang G,Karimi H R.Data-driven design of robust fault detection system for wind turbines [J].Mechatronics,2014,24(4):298-306
[15] Rastkar S,Murphy G,Murray G.Automatic Summarization of Bug Reports[J].IEEE Transactions on Software Engineering,2014,40(4):366-380
[16] 王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580 Wang Q,Wu S J,Li M S.Software defect prediction [J].Journal of Software,2008,19(7):1565-1580
[17] Murray G,Carenini G.Summarizing spoken and written conversations[C]∥Proceedings of the Conference on Empirical Me-thods in Natural Language Processing(EMNLP’08).2008:773-782
[18] Chen Y W,Lin C J.Combining SVMs with various feature selection strategies[M]∥Feature Extraction.Springer Berlin Heidelberg,2006:315-324
[19] Xuan J,Jiang H,Ren Z,et al.Solving the large scale next release problem with a backbone-based multilevel algorithm[J].IEEE Transactions on Software Engineering,2012,38(5):1195-1212
[20] Srinivasa K G,Venugopal K R,Patnaik L M.Feature extraction using fuzzy c-means clustering for data mining systems[J].International Journal of Computer Science and Network Security,2006,6(3A):230-236
[21] Salton G,Wong A,Yang C S.A vector space model for automa-tic indexing[J].Communications of the ACM,1975,18(11):613-620
[22] Aggarwal C C,Zhai C.A survey of text clustering algorithms[M]∥Mining Text Data.Springer US,2012:77-128
[23] s,cl S,,Güngr T.Comparison of text feature selection policies and using an adaptive framework [J].Expert Systems with Applications,2013,40(12):4871-4886
[24] Sourcy P,Mineau G W.Beyond TFIDF weighting for text categorization in the vector space model[C]∥Proceedings of the 19th international joint conference on Artificial intelligence(IJCAI’05).2005:1130-1135

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!