Computer Science ›› 2015, Vol. 42 ›› Issue (9): 159-164.doi: 10.11896/j.issn.1002-137X.2015.09.031

Previous Articles     Next Articles

Data Driven Feature Extraction for Mining Software Repositories

LI Xiao-chen, JIANG He and REN Zhi-lei   

  • Online:2018-11-14 Published:2018-11-14

Abstract: In mining software repositories(MSR),software tasks are usually transformed into data mining problems for solving.Domain-specific features heavily impact the solving of software tasks.However,no systematic investigation has been conducted on the issue of extracting features for specific software tasks.In this study,data driven feature extraction(DDFE) is a new feature extraction approach.For a software task,DDFE extracts a set of software data(e.g.,source code,bug reports) and employs some volunteers to manually accomplish this software task.During the process,these volunteers are requested to submit their reasons under consideration.From these submitted reasons,DDFE can extract domain-specific features for software tasks.The experimental results on the task of bug report summarization demonstrate that DDFE may find effective features and achieve better predictive results against the state-of-the-art algorithm in the literatures.

Key words: Mining software repositories,Data driven approach,Feature extraction,Bug report summarization

[1] Xie T,Pei J,Hassan A E.Mining software engineering data[C]∥Proceedings of the 29th International Conference on Software Engineering(ICSE’2007).2007:172-173
[2] Hassan A E,Xie T.Software intelligence:the future of miningsoftware engineering data[C]∥Proceedings of the FSE/SDP workshop on Future of Software Engineering Research(FoSER’2010).2010:161-166
[3] Xie T,Thummalapenta S,Lo D,et al.Data mining for software engineering [J].Computer,2009,42(8):55-62
[4] Srinivasa K G,Venugopal K R,Patnaik L M.Feature extraction using fuzzy c-means clustering for data mining systems[J].International Journal of Computer Science and Network Security,2006,6(3A):230-236
[5] Sun C,Lo D,Khoo S C,et al.Towards more accurate retrieval of duplicate bug reports[C]∥Proceedings of 2011 26th IEEE/ACM International Conference on Automated Software Engineering(ASE’11).2011:253-262
[6] Anvik J,Hiew L,Murphy G C.Who should fix this bug? [C]∥Proceedings of the 28th International Conference on Software Engineering(ICSE’06).2006:361-370
[7] Jeong G,Kim S,Zimmermann T.Improving bug triage with bug tossing graphs[C]∥Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering(FSE’09).2009:111-120
[8] Xuan J,Jiang H,Ren Z,et al.Developer prioritization in bug repositories[C]∥Proceedings of the 34th International Confe-rence on Software Engineering(ICSE’12).2012:25-35
[9] Mani S,Catherine R,Sinha V S,et al.Ausum:approach for unsupervised bug report summarization[C]∥Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering(FSE’12).2012:11-21
[10] Lotufo R,Malik Z,Czarnecki K.Modelling the ‘hurried’ bug report reading process to summarize bug reports[C]∥Proceedings of the 28th IEEE International Conference on Software Maintenance(ICSM’12).2012:430-439
[11] Runeson P,Alexandersson M,Nyholm O.Detection of duplicate defect reports using natural language processing[C]∥Procee-dings of the 29th International Conference on Software Enginee-ring(ICSE’07).2007:499-510
[12] Rastkar S,Murphy G C,Murray G.Summarizing software artifacts:a case study of bug reports[C]∥ Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering(ICSE’10).2010,1:505-514
[13] Yin S,Ding S,Xie X,et al.A review on basic data-driven approaches for industrial process monitoring [J].IEEE Transactions on Industrial Electronics,2014,61(11):6418-6428
[14] Yin S,Wang G,Karimi H R.Data-driven design of robust fault detection system for wind turbines [J].Mechatronics,2014,24(4):298-306
[15] Rastkar S,Murphy G,Murray G.Automatic Summarization of Bug Reports[J].IEEE Transactions on Software Engineering,2014,40(4):366-380
[16] 王青,伍书剑,李明树.软件缺陷预测技术[J].软件学报,2008,19(7):1565-1580 Wang Q,Wu S J,Li M S.Software defect prediction [J].Journal of Software,2008,19(7):1565-1580
[17] Murray G,Carenini G.Summarizing spoken and written conversations[C]∥Proceedings of the Conference on Empirical Me-thods in Natural Language Processing(EMNLP’08).2008:773-782
[18] Chen Y W,Lin C J.Combining SVMs with various feature selection strategies[M]∥Feature Extraction.Springer Berlin Heidelberg,2006:315-324
[19] Xuan J,Jiang H,Ren Z,et al.Solving the large scale next release problem with a backbone-based multilevel algorithm[J].IEEE Transactions on Software Engineering,2012,38(5):1195-1212
[20] Srinivasa K G,Venugopal K R,Patnaik L M.Feature extraction using fuzzy c-means clustering for data mining systems[J].International Journal of Computer Science and Network Security,2006,6(3A):230-236
[21] Salton G,Wong A,Yang C S.A vector space model for automa-tic indexing[J].Communications of the ACM,1975,18(11):613-620
[22] Aggarwal C C,Zhai C.A survey of text clustering algorithms[M]∥Mining Text Data.Springer US,2012:77-128
[23] s,cl S,,Güngr T.Comparison of text feature selection policies and using an adaptive framework [J].Expert Systems with Applications,2013,40(12):4871-4886
[24] Sourcy P,Mineau G W.Beyond TFIDF weighting for text categorization in the vector space model[C]∥Proceedings of the 19th international joint conference on Artificial intelligence(IJCAI’05).2005:1130-1135

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!