计算机科学 ›› 2014, Vol. 41 ›› Issue (9): 52-59.doi: 10.11896/j.issn.1002-137X.2014.09.008
华哲邦,李萌,赵俊峰,邹艳珍,谢冰,李扬
HUA Zhe-bang,LI Meng,ZHAO Jun-feng,ZOU Yan-zhen,XIE Bing and LI Yang
摘要: 代码复用是重要的软件复用方式之一,复用者需要理解软件代码实现的功能方能有效实施软件复用。基于主题建模技术的程序理解方法逐渐受到研究人员的重视,它能够帮助软件开发者和使用者更好地理解软件的功能。目前,基于主题建模技术的程序理解方法一般欠缺对挖掘出的Topic的语义分析,为此提出的基于代码静态分析和LDA技术的代码功能挖掘(Code Function Mining,CFM)方法可作为对这类方法的补充。CFM是一套以代码为研究对象的挖掘、筛选、组织和描述主题(Topic)的方法,该方法能够生成带描述的功能型Topic的层次结构,以供使用者更清晰和方便地浏览、学习软件的功能。功能型Topic的描述能够帮助复用者理解代码功能,其层次结构能够让复用者从不同抽象层次理解代码功能。CFM方法包括4个部分:挖掘Topic、筛选Topic、组织Topic、描述Topic。以CFM方法为基础,设计并实现了一个CFM工具。CFM工具能够分析用户提交的代码,通过Web页面向用户展示带描述的功能型Topic的层次结构。最后,对CFM方法中的几个关键算法进行实验分析,验证了CFM方法的有效性。
[1] 杨芙清,梅宏,李克勤.软件复用与软件构件技术[J].电子学报,1999,27(2):68-75 [2] Cleland-Huang J,Gotel O,Zisman A.Software andSystemsTraceability[M].Springer,2012 [3] Kuhn A,Ducasse S,Girba T.Semanticclustering:Identifyingtopics in sourcecode[J].Information and Software Technology, 2007,49(3):230-243 [4] Maskeri G,Sarkar S,Heafield K.Mining business topics insource code using latent dirichletallocation[C]∥Proceedings of the 1st India software engineering conference.ACM,2008:113-120 [5] Gethers M,Savage T,Di Penta M,et al.Codetopics:whichtopic am I coding now[C]∥ 33rd International Conference on Software Engineering (ICSE).IEEE,2011:1034-1036 [6] Blei D M,Lafferty J D.Topic models[J].Text mining:classification,clustering,and applications,2009(10):71 [7] Frigyik B A,Kapila A,Gupta M R.Introduction to the Dirichlet Distribution and Related Processes[R].UWEE Technical Report Number UWEETR-2010-0006.2010 [8] Heinrich G.Parameter estimation for text analysis[R].Technical Report.Fraunhofer IGD,Darmstadt,Germany,2009 [9] Baldi P F,Lopes C V,Linstead E J,et al.A Theory of Aspects as Latent Topics[C]∥Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications,OOPSLA’08.2008:543-562 [10] Blei D,Lafferty J.Correlated topic models[J].Advances in neural information processing systems,2006,18:147 [11] Blei D M,Griffiths T L,Jordan M I,et al.Hierarchical topic models and the nested chinese restaurant process[C]∥Advances in Neural Information Processing Systems 16:Proceedings of the 2003Conference.MIT Press,2004,6:17 [12] Blei D M,Griffiths T L,Jordan M I.The Nested Chinese Restaurant Process and BayesianNonparametric Inference of Topic Hierarchies[J].Journal of the ACM (JACM),2010,57(2):1-30 [13] Segal E,Koller D,Ormoneit D.Probabilistic abstraction hierarchies[J].Advances in Neural Information Processing Systems,2002(2):913-920 [14] Griffiths T L,Steyvers M.Finding scientific topics[J].PNAS,2004,101:5228-5235 [15] Panichella A,Dit B,Oliveto R,et al.How to effectively use topic models for software engineering tasks on approach based on genetic algorithm[C]∥Proceedings of the 2013 International Conference on Software Engineering.IEEE Press,2013:522-531 [16] Savage T,Dit B,Gethers M,et al.TopicXP:Exploring Topics in Source Code using Latent Dirichlet Allocation[C]∥ICSM.2010:1-6 |
No related articles found! |
|