计算机科学 ›› 2017, Vol. 44 ›› Issue (4): 35-38.doi: 10.11896/j.issn.1002-137X.2017.04.008
李文鹏,赵俊峰,谢冰
LI Wen-peng, ZHAO Jun-feng and XIE Bing
摘要: 理解软件代码的功能是软件复用的一个重要环节。基于主题建模技术的代码理解方法能够挖掘软件代码中潜在的主题,这些主题在一定程度上代表了软件代码所实现的功能。但是使用主题建模技术所挖掘出的代码主题有着语义模糊、难以理解的弊端。潜在狄利克雷分配(Latent Dirichlet Allocation,LDA)技术是一种比较常用的主题建模技术, 其在软件代码主题挖掘领域已取得了较好的结果,但同样存在上述问题。为此,需要为主题生成解释性文本描述。基于LDA的软件代码主题摘要自动生成方法除了利用主题建模技术对源代码生成主题之外,还利用文档、问答信息等包含软件系统功能描述的各类软件资源挖掘出代码主题的描述文本并提取摘要,从而能够更好地帮助开发人员理解软件的功能。
[1] YANG F Q,MEI H,LI K Q.Software Reuse and Software Component Technology[J].Chinese Journal of Electronics,1999,27(2):68-75.(in Chinese) 杨芙清,梅宏,李克勤.软件复用与软件构件技术[J].电子学报,1999,27(2):68-75. [2] ABRAN A,MOORE J,BOURQUE P,et al.Guide to the software engineering body of knowledge[M]∥SWEBOK.IEEE Computer Society,2004. [3] KUHN A,DUCASSE S,GLRBA.Semantic clustering:Identifying topics in source code[J].Information and Software Technology,2007,49(3):230-243 [4] HOFMANN,THOMAS.Probabilistic Latent Semantic Indexing[C]∥Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval.1999:50-57. [5] BIEID,NG A,JORDAN M.Latent dirichlet allocation[J].The Journal of Machine Learning Research, 2003,3:993-1022. [6] WEI X,Croft W B.LDA-based document models for ad-hoc retrieval[C]∥Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’06).2006:178-185. [7] BALDI P F,LOPES C V,L INSTEAD E J,et al.A Theory of Aspects as Latent Topics[C]∥Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications(OOPSLA’08).2008:543-562. [8] MASKERI G,SARKAR S,HEAFIELD K.Mining BusinessTopics in Source Code using Latent Dirichlet Allocation[C]∥Proceedings of the 1st India software engineering conference(ISEC’08).2008:113-120 [9] XIE B,LI M,JIN J,et al.Mining Cohesive Domain Topics from Source Code[M]∥Safe and Secure Software Reuse:ICSR 2013.LNCS 7925,2013:239-254. [10] HAIDUC S,APONTE J,MORENO L,et al.On the Use of Automated Text Summarization Techniques for Summarizing Source Code[C]∥2010 17th Working Conference on Reverse Engineering (WCRE).IEEE,2010:35-44. [11] EDDY B P,ROBINSON J A,KRAFT N A,et al.Evaluatingsource code summarization techniques:Replication and expansion[C]∥2013 IEEE 21st International Conference on Program Comprehension (ICPC).IEEE,2013:13-22. [12] CHANG J,BLEIl D M.Hierarchical relational models for document networks[J].The Annals of Applied Statistics,2010,4(1):124-150. [13] ERKAN G,RADEV D R.LexRank:graph-based lexical centra-lity as salience in text summarization[J].Journal of Artificial Intelligence Research,2011,22(1):457-479. [14] MCCANDLESS M,HATCHER E,G OSPODNETIC O.Lucene in Action(Second Edition)[M].The United States of America:Manning Publications Co.,2010:532. [15] GRIFFITHS T L,STEYVERS M.Finding scientific topics[J].PNAS,2004,101:5228-5235. [16] ARAFAT O,RIEHLE D.The comment density of open source software code[C]∥31st International Conference on Software Engineering-Companion(ICSE-Companion 2009).IEEE,2009:195-198. [17] FLURI B,WRSCH M,GALL H C.Do code and comments co-evolve? on the relation between source code and comment changes[C]∥14th Working Conference on Reverse Engineering,2007(WCRE 2007).IEEE,2007:70-79. |
No related articles found! |
|