Computer Science ›› 2017, Vol. 44 ›› Issue (4): 35-38.doi: 10.11896/j.issn.1002-137X.2017.04.008

Previous Articles     Next Articles

Summary Extraction Method for Code Topic Based on LDA

LI Wen-peng, ZHAO Jun-feng and XIE Bing   

  • Online:2018-11-13 Published:2018-11-13

Abstract: Understanding the function of software code is important in software reuse.Topic modeling technologies can mine the latent topics from software code,which represent the software function.But these topics lack unambiguous explanation that make them hard to be understood by the developers.Latent Dirichlet allocation (LDA) technology is one of the popular topic modeling technology.There are studies which have used LDA to mine software code and get a good result,but there are also the problems in topic description.In this paper,in addition to the use of topic modeling techno-logy to generate topics from source code,explanatory text descriptions were generated for code topics from software resource such as documents,pairs of question and answer,mailing lists and so on.It can help users to understand the function of software code.The experiments show that the approach proposed in this paper is effective.

Key words: Software code,LDA,Code function mining,Software document,Summarization

[1] YANG F Q,MEI H,LI K Q.Software Reuse and Software Component Technology[J].Chinese Journal of Electronics,1999,27(2):68-75.(in Chinese) 杨芙清,梅宏,李克勤.软件复用与软件构件技术[J].电子学报,1999,27(2):68-75.
[2] ABRAN A,MOORE J,BOURQUE P,et al.Guide to the software engineering body of knowledge[M]∥SWEBOK.IEEE Computer Society,2004.
[3] KUHN A,DUCASSE S,GLRBA.Semantic clustering:Identifying topics in source code[J].Information and Software Technology,2007,49(3):230-243
[4] HOFMANN,THOMAS.Probabilistic Latent Semantic Indexing[C]∥Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval.1999:50-57.
[5] BIEID,NG A,JORDAN M.Latent dirichlet allocation[J].The Journal of Machine Learning Research, 2003,3:993-1022.
[6] WEI X,Croft W B.LDA-based document models for ad-hoc retrieval[C]∥Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’06).2006:178-185.
[7] BALDI P F,LOPES C V,L INSTEAD E J,et al.A Theory of Aspects as Latent Topics[C]∥Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications(OOPSLA’08).2008:543-562.
[8] MASKERI G,SARKAR S,HEAFIELD K.Mining BusinessTopics in Source Code using Latent Dirichlet Allocation[C]∥Proceedings of the 1st India software engineering conference(ISEC’08).2008:113-120
[9] XIE B,LI M,JIN J,et al.Mining Cohesive Domain Topics from Source Code[M]∥Safe and Secure Software Reuse:ICSR 2013.LNCS 7925,2013:239-254.
[10] HAIDUC S,APONTE J,MORENO L,et al.On the Use of Automated Text Summarization Techniques for Summarizing Source Code[C]∥2010 17th Working Conference on Reverse Engineering (WCRE).IEEE,2010:35-44.
[11] EDDY B P,ROBINSON J A,KRAFT N A,et al.Evaluatingsource code summarization techniques:Replication and expansion[C]∥2013 IEEE 21st International Conference on Program Comprehension (ICPC).IEEE,2013:13-22.
[12] CHANG J,BLEIl D M.Hierarchical relational models for document networks[J].The Annals of Applied Statistics,2010,4(1):124-150.
[13] ERKAN G,RADEV D R.LexRank:graph-based lexical centra-lity as salience in text summarization[J].Journal of Artificial Intelligence Research,2011,22(1):457-479.
[14] MCCANDLESS M,HATCHER E,G OSPODNETIC O.Lucene in Action(Second Edition)[M].The United States of America:Manning Publications Co.,2010:532.
[15] GRIFFITHS T L,STEYVERS M.Finding scientific topics[J].PNAS,2004,101:5228-5235.
[16] ARAFAT O,RIEHLE D.The comment density of open source software code[C]∥31st International Conference on Software Engineering-Companion(ICSE-Companion 2009).IEEE,2009:195-198.
[17] FLURI B,WRSCH M,GALL H C.Do code and comments co-evolve? on the relation between source code and comment changes[C]∥14th Working Conference on Reverse Engineering,2007(WCRE 2007).IEEE,2007:70-79.

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!