Computer Science ›› 2014, Vol. 41 ›› Issue (9): 52-59.doi: 10.11896/j.issn.1002-137X.2014.09.008

Previous Articles     Next Articles

Code Function Mining Tool Based on Topic Modeling Technology

HUA Zhe-bang,LI Meng,ZHAO Jun-feng,ZOU Yan-zhen,XIE Bing and LI Yang   

  • Online:2018-11-14 Published:2018-11-14

Abstract: Code reuse is an important way of software reuse.Software engineers need to understand the code functions before they reuse the code.A Code Function Mining (CFM) method based on static code analysis and LDA technologies was proposed.CFM method is a code-oriented method for mining,filtering,organizing and describing topics.The output of CFM method is a hierarchy structure of functional topics with descriptions.Topic descriptions can help better learn code functions and the hierarchy structure can help better understand code functions from different abstraction levels.CFM method can be used as a supplement of traditional methods based on topic modeling technology to make up for the lack of semantic analysis of topics. CFM method includes four parts:Topic Mining,Topic Filtering,Topic Organizing,Topic Describing.A CFM tool based on CFM method can automatically analyze code and show the function topic hierarchy to users through Web page.To verify the validity of CFM method,the experimental analysis was also presented on several key algorithms in it.

Key words: Software code,Static code analysis,LDA,Code function mining

[1] 杨芙清,梅宏,李克勤.软件复用与软件构件技术[J].电子学报,1999,27(2):68-75
[2] Cleland-Huang J,Gotel O,Zisman A.Software andSystemsTraceability[M].Springer,2012
[3] Kuhn A,Ducasse S,Girba T.Semanticclustering:Identifyingtopics in sourcecode[J].Information and Software Technology, 2007,49(3):230-243
[4] Maskeri G,Sarkar S,Heafield K.Mining business topics insource code using latent dirichletallocation[C]∥Proceedings of the 1st India software engineering conference.ACM,2008:113-120
[5] Gethers M,Savage T,Di Penta M,et al.Codetopics:whichtopic am I coding now[C]∥ 33rd International Conference on Software Engineering (ICSE).IEEE,2011:1034-1036
[6] Blei D M,Lafferty J D.Topic models[J].Text mining:classification,clustering,and applications,2009(10):71
[7] Frigyik B A,Kapila A,Gupta M R.Introduction to the Dirichlet Distribution and Related Processes[R].UWEE Technical Report Number UWEETR-2010-0006.2010
[8] Heinrich G.Parameter estimation for text analysis[R].Technical Report.Fraunhofer IGD,Darmstadt,Germany,2009
[9] Baldi P F,Lopes C V,Linstead E J,et al.A Theory of Aspects as Latent Topics[C]∥Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications,OOPSLA’08.2008:543-562
[10] Blei D,Lafferty J.Correlated topic models[J].Advances in neural information processing systems,2006,18:147
[11] Blei D M,Griffiths T L,Jordan M I,et al.Hierarchical topic models and the nested chinese restaurant process[C]∥Advances in Neural Information Processing Systems 16:Proceedings of the 2003Conference.MIT Press,2004,6:17
[12] Blei D M,Griffiths T L,Jordan M I.The Nested Chinese Restaurant Process and BayesianNonparametric Inference of Topic Hierarchies[J].Journal of the ACM (JACM),2010,57(2):1-30
[13] Segal E,Koller D,Ormoneit D.Probabilistic abstraction hierarchies[J].Advances in Neural Information Processing Systems,2002(2):913-920
[14] Griffiths T L,Steyvers M.Finding scientific topics[J].PNAS,2004,101:5228-5235
[15] Panichella A,Dit B,Oliveto R,et al.How to effectively use topic models for software engineering tasks on approach based on genetic algorithm[C]∥Proceedings of the 2013 International Conference on Software Engineering.IEEE Press,2013:522-531
[16] Savage T,Dit B,Gethers M,et al.TopicXP:Exploring Topics in Source Code using Latent Dirichlet Allocation[C]∥ICSM.2010:1-6

No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!