Computer Science ›› 2023, Vol. 50 ›› Issue (12): 75-81.doi: 10.11896/jsjkx.230100115

• Computer Software • Previous Articles     Next Articles

CodeBERT-based Language Model for Design Patterns

CHEN Shifei, LIU Dong, JIANG He   

  1. School of Software Technology,Dalian University of Technology,Dalian,Liaoning 116620,China
  • Received:2023-01-31 Revised:2023-05-22 Online:2023-12-15 Published:2023-12-07
  • About author:CHEN Shifei,born in 1998,master.His main research interest is natural language processing.
    JIANG He,born in 1980,Ph.D,professor,is a distinguished member of China Computer Federation.His main research interests include system software and software engineering.
  • Supported by:
    National Natural Science Foundation of China(61722202).

Abstract: As summarizations of the experiences of practical software design,design patterns are regarded as an effective means for software design assistance.Most of the current researches on design patterns mining aim at recognition of design pattern instance in source codes,modelling design patterns with natural language corpus is largely unexplored.In order to enhance the performance of language model for recommending design patterns with codes,class diagram or object collaboration,a design pattern classification mining model based on CodeBERT,named dpCodeBERT,is proposed,achieving the contrast understanding of design patterns in natural language and programming language.Firstly,multi-classification dataset and code search dataset are ge-nerated using random combination and used as inputs of the model.Using dpCodeBERT to get attention weights of each layer of transformer of each token and statement from the inputs.Secondly,the input dataset is further improved by analyzing attention weights and discovering the most important category of inputs.Finally,dpCodeBERT is applied to specific software engineering downstream tasks such as design patterns selection and design patterns code search.The purposes of tasks are accomplished by mapping distributed features to sample space trough fully connected layers and outputting multi values.The result of the experiment on 80 software design problems in design pattern selection task shows that ratio of correct detection of design pattern(RCDDP)and mean reciprocal rank(MRR) of dpCodeBERT are improved by the average of 10%~20% compared with baseline mo-dels,and the design pattern selection is more accurate.Through in-depth study of the data demand of the model,dpCodeBERT improves the understanding of class code of CodeBERT and discovers the application of CodeBERT in design patterns mining.It has the characteristics of accurate prediction and great scalability.

Key words: Design pattern mining, Natural language processing, Pre-trained language models, CodeBERT, Model fine-tuning, Vector quantization

CLC Number: 

  • TP311
[1]HASHEMINEJAD S M H,JALILI S.Design patterns selec-tion:An automatic two-phase method[J].The Journal of Systems and Software,2012,85(2):408-424.
[2]FONTANA F A,MAGGIONI S,RAIBULET C.Design pat-terns:a survey on their micro-structures[J].Journal of Software:Evolution and Process,2013,25(1):27-52.
[3]ZHANG C,BUDGEN D.What do we know about the effectiveness of software design patterns?[J].IEEE Transactions on Software Engineering,2012,38(5):1213-1231.
[4]GAMMA,E,HELM R,JOHNSON R,et al.Design Patterns:Elements of Reusable Object-Oriented Software[M]//Rea-ding.MA:Addison-Wesley,1995.
[5]MAYVAN B B,RASOOLZADEGAN A,YAZDI Z G.The state of the art on design patterns:a systematic mapping of the literature[J].Journal of Systems and Software,2017,125(3):93-118.
[6]ZHU H,BAYLEY I.An algebra of design patterns[J].ACMTransactions on Software Engineering and Methodology,2013,22(3):23-61.
[7]ZANONI M,FONTANA F A,STELLA F.On applying ma-chine learning techniques for design pattern detection[J].Journal of Systems and Software,2015,88(5):102-117.
[8]CHIHADA A,JALILI S,HASHEMINEJAD S M H,et al.Source code and design conformance,design pattern detection from source code by classification approach[J].Applied Soft Computing,2015,26(1):357-367.
[9]MAYVAN B B,RASOOLZADEGAN A.Design pattern detection based on the graph theory[J].Knowledge-Based Systems,2017,120(1):211-225.
[10]DWIVEDI A K,TIRKEY A,RATH S K.Applying learning-based methods for recognizing design patterns[J].Innovations in Systems and Software Engineering,2019,15(2):87-100.
[11]DWIVEDI A K,TIRKEY A,RATH S K.Software design pattern mining using classification-based techniques[J].Frontiers of Computer Science,2018,12(5):908-922.
[12]PETTERSON N,LÖWE W,NIVRE J.Evaluation of accuracy in design pattern occurrence detection[J].IEEE Transactions on Software Engineering,2010,36(4):575-590.
[13]YU D,ZHANG P,YANG J,et al.Efficiently detecting structu-ral design pattern instances based on ordered sequences[J].Journal of Systems and Software,2018,91(5):35-56.
[14]XIAO Z Y,HUANG H,HE P,et al.Evaluation strategy of efficiency in design pattern detection tools[J].Journal of Frontiers of Computer Science and Technology,2018,12(3):380-392.
[15]HUSSAIN S,KEUNG J,KHAN A A.Software design patterns classification and selection using text categorization approach[J].Applied Soft Computing,2017,58:225-244.
[16]LIU D,JIANG H,LI X,et al.DPWord2Vec:better representation of design patterns in semantics[J].IEEE Transactions on Software Engineering,2020,48(4):1228-1248.
[17]LIU D.Data-Driven Software Design Pattern Analysis and Application[D].Dalian:Dalian University of Technology,2022.
[18]DOUGLASS B P.Real-Time Design Patterns:Robust Scalable Architecture for Real-Time Systems[M].Boston MA:Addison-Wesley/Longman Publishing,2002.
[19]SCHUMACHER M,FERNANDEZ-BUGLIONI E,HYBERTSON D,et al.Security patterns:Integrating security and systems engineering[M].Hoboken:John Wiley & Sons,2006.
[20]BAO L,XING Z,XIA X,et al.Psc2code:Denoising code extraction from programming screencasts[J].ACM Transactions on Software Engineering Methodology,2020,29(3):1-21,48.
[21]BEZDEK J C.Pattern recognition with fuzzy objective function algorithms[M].New York:Springer Science & Business Media,2013.
[22]UYSAL A K.An improved global feature selection scheme for text classification[J].Expert Systems with Applications,2016,43:82-92.
[23]ZHANG Z,ZHANG H,SHEN B,et al.Diet code is healthy:Simplifying programs for pre-trained models of code[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2022:1073-1084.
[24]HUSAIN H,WU H H,GAZIT T,et al.Codesearchnet chal-lenge:Evaluating the state of semantic code search[J].arXiv:1909.09436,2019.
[1] ZHANG Yian, YANG Ying, REN Gang, WANG Gang. Study on Multimodal Online Reviews Helpfulness Prediction Based on Attention Mechanism [J]. Computer Science, 2023, 50(8): 37-44.
[2] ZHOU Ziyi, XIONG Hailing. Image Captioning Optimization Strategy Based on Deep Learning [J]. Computer Science, 2023, 50(8): 99-110.
[3] WEI Tao, LI Zhihua, WANG Changjie, CHENG Shunhang. Cybersecurity Threat Intelligence Mining Algorithm for Open Source Heterogeneous Data [J]. Computer Science, 2023, 50(6): 330-337.
[4] WANG Lin, MENG Zuqiang, YANG Lina. Chinese Sentiment Analysis Based on CNN-BiLSTM Model of Multi-level and Multi-scale Feature Extraction [J]. Computer Science, 2023, 50(5): 248-254.
[5] ZHEN Tiange, SONG Mingyang, JING Liping. Incorporating Multi-granularity Extractive Features for Keyphrase Generation [J]. Computer Science, 2023, 50(4): 181-187.
[6] HUAN Zhigang, JIANG Guoquan, ZHANG Yujian, LIU Liu, LIU Shanshan. Employing Gated Mechanism to Incorporate Multi-features into Chinese Event Coreference Resolution [J]. Computer Science, 2023, 50(3): 291-297.
[7] QIN Mingfei, FU Guohong. Multi-level Semantic Structure Enhanced Emotional Cause Span Extraction in Conversations [J]. Computer Science, 2023, 50(12): 236-245.
[8] FAN Dongxu, GUO Yi. Aspect-based Multimodal Sentiment Analysis Based on Trusted Fine-grained Alignment [J]. Computer Science, 2023, 50(12): 246-254.
[9] WANG Zhendong, DONG Kaikun, HUANG Junheng, WANG Bailing. SemFA:Extreme Multi-label Text Classification Model Based on Semantic Features and Association Attention [J]. Computer Science, 2023, 50(12): 270-278.
[10] HE Wenhao, WU Chunjiang, ZHOU Shijie, HE Chaoxin. Study on Short Text Clustering with Unsupervised SimCSE [J]. Computer Science, 2023, 50(11): 71-76.
[11] HUAN Zhigang, JIANG Guoquan, ZHANG Yujian, LIU Liu, DING Kun. End-to-End Event Coreference Resolution Based on Core Sentence [J]. Computer Science, 2023, 50(11): 185-191.
[12] SHAO Wenqiang, CAI Ruijie, SONG Enzhou, GUO Xixi, LIU Shengli. Semantic-based Multi-architecture Binary Function Name Prediction Method [J]. Computer Science, 2023, 50(10): 369-376.
[13] ZHENG Cheng, MEI Liang, ZHAO Yiyan, ZHANG Suhang. Text Classification Method Based on Bidirectional Attention and Gated Graph Convolutional Networks [J]. Computer Science, 2023, 50(1): 221-228.
[14] WANG Guan-yu, ZHONG Ting, FENG Yu, ZHOU Fan. Collaborative Filtering Recommendation Method Based on Vector Quantization Coding [J]. Computer Science, 2022, 49(9): 48-54.
[15] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!