计算机科学 ›› 2023, Vol. 50 ›› Issue (12): 75-81.doi: 10.11896/jsjkx.230100115
陈时非, 刘东, 江贺
CHEN Shifei, LIU Dong, JIANG He
摘要: 设计模式是对实际软件设计方案的经验性总结,是软件开发中辅助软件设计的有效方案之一。现有设计模式挖掘研究的任务大多是在源代码中识别设计模式的实例,少有考虑用自然语言语料对设计模式建模。为了提升设计模式语言分类模型的推荐效果,将代码、类图或对象协作纳入考虑范围,提出了一种基于CodeBERT的设计模式分类挖掘模型dpCodeBERT,以实现自然语言与代码语言的对照理解。首先,通过随机组合合成多分类算法数据和代码搜索数据作为模型输入,dpCodeBERT模型能够获取transformer层中的模型为令牌生成的注意力权重;然后,分析令牌和语句注意力权重以发现更有效的模型输入类别,进一步改造训练输入;最后,dpCodeBERT模型能够通过全连接层将分布式特征映射到样本空间并输出复数值的方式实现具体软件工程任务,如设计模式选择和设计模式代码搜索任务。在拥有80个软件设计问题的设计模式选择任务的数据集上的实验结果显示,相比同类基准模型,所提模型在设计模式检测准确率(RCDDP)和平均倒数排名(MRR)两个指标上平均提升了10%~20%,设计模式选择更加准确。通过深度研究模型数据需求,dpCodeBERT挖掘了CodeBERT对类级代码的理解,探索了CodeBERT在设计模式挖掘中的应用,具有预测准确、拓展性强等特点。
中图分类号:
[1]HASHEMINEJAD S M H,JALILI S.Design patterns selec-tion:An automatic two-phase method[J].The Journal of Systems and Software,2012,85(2):408-424. [2]FONTANA F A,MAGGIONI S,RAIBULET C.Design pat-terns:a survey on their micro-structures[J].Journal of Software:Evolution and Process,2013,25(1):27-52. [3]ZHANG C,BUDGEN D.What do we know about the effectiveness of software design patterns?[J].IEEE Transactions on Software Engineering,2012,38(5):1213-1231. [4]GAMMA,E,HELM R,JOHNSON R,et al.Design Patterns:Elements of Reusable Object-Oriented Software[M]//Rea-ding.MA:Addison-Wesley,1995. [5]MAYVAN B B,RASOOLZADEGAN A,YAZDI Z G.The state of the art on design patterns:a systematic mapping of the literature[J].Journal of Systems and Software,2017,125(3):93-118. [6]ZHU H,BAYLEY I.An algebra of design patterns[J].ACMTransactions on Software Engineering and Methodology,2013,22(3):23-61. [7]ZANONI M,FONTANA F A,STELLA F.On applying ma-chine learning techniques for design pattern detection[J].Journal of Systems and Software,2015,88(5):102-117. [8]CHIHADA A,JALILI S,HASHEMINEJAD S M H,et al.Source code and design conformance,design pattern detection from source code by classification approach[J].Applied Soft Computing,2015,26(1):357-367. [9]MAYVAN B B,RASOOLZADEGAN A.Design pattern detection based on the graph theory[J].Knowledge-Based Systems,2017,120(1):211-225. [10]DWIVEDI A K,TIRKEY A,RATH S K.Applying learning-based methods for recognizing design patterns[J].Innovations in Systems and Software Engineering,2019,15(2):87-100. [11]DWIVEDI A K,TIRKEY A,RATH S K.Software design pattern mining using classification-based techniques[J].Frontiers of Computer Science,2018,12(5):908-922. [12]PETTERSON N,LÖWE W,NIVRE J.Evaluation of accuracy in design pattern occurrence detection[J].IEEE Transactions on Software Engineering,2010,36(4):575-590. [13]YU D,ZHANG P,YANG J,et al.Efficiently detecting structu-ral design pattern instances based on ordered sequences[J].Journal of Systems and Software,2018,91(5):35-56. [14]XIAO Z Y,HUANG H,HE P,et al.Evaluation strategy of efficiency in design pattern detection tools[J].Journal of Frontiers of Computer Science and Technology,2018,12(3):380-392. [15]HUSSAIN S,KEUNG J,KHAN A A.Software design patterns classification and selection using text categorization approach[J].Applied Soft Computing,2017,58:225-244. [16]LIU D,JIANG H,LI X,et al.DPWord2Vec:better representation of design patterns in semantics[J].IEEE Transactions on Software Engineering,2020,48(4):1228-1248. [17]LIU D.Data-Driven Software Design Pattern Analysis and Application[D].Dalian:Dalian University of Technology,2022. [18]DOUGLASS B P.Real-Time Design Patterns:Robust Scalable Architecture for Real-Time Systems[M].Boston MA:Addison-Wesley/Longman Publishing,2002. [19]SCHUMACHER M,FERNANDEZ-BUGLIONI E,HYBERTSON D,et al.Security patterns:Integrating security and systems engineering[M].Hoboken:John Wiley & Sons,2006. [20]BAO L,XING Z,XIA X,et al.Psc2code:Denoising code extraction from programming screencasts[J].ACM Transactions on Software Engineering Methodology,2020,29(3):1-21,48. [21]BEZDEK J C.Pattern recognition with fuzzy objective function algorithms[M].New York:Springer Science & Business Media,2013. [22]UYSAL A K.An improved global feature selection scheme for text classification[J].Expert Systems with Applications,2016,43:82-92. [23]ZHANG Z,ZHANG H,SHEN B,et al.Diet code is healthy:Simplifying programs for pre-trained models of code[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2022:1073-1084. [24]HUSAIN H,WU H H,GAZIT T,et al.Codesearchnet chal-lenge:Evaluating the state of semantic code search[J].arXiv:1909.09436,2019. |
|