Computer Science ›› 2023, Vol. 50 ›› Issue (5): 64-71.doi: 10.11896/jsjkx.220100094

• Explainable AI • Previous Articles     Next Articles

Code Embedding Method Based on Neural Network

SUN Xuekai, JIANG Liehui   

  1. State Key Laboratory of Mathematical Engineering andAdvanced Computing,PLA Information Engineering University,Zhengzhou 450001,China
  • Received:2022-01-11 Revised:2022-09-19 Online:2023-05-15 Published:2023-05-06
  • About author:SUN Xuekai,born in 1991,Ph.D candidate.His main research interests include code similarity detection and code vulnerability mining.
    JIANG Liehui,born in 1967,Ph.D,professor.His main research interests include computer architecture,reverse engineering,and cyberspace security.

Abstract: There are many application scenarios for code analysis and research,such as code plagiarism detection and software vulnerability search.With the development of artificial intelligence,neural network technology has been widely used in code analysis and research.However,the existing methods either simply treat the code as ordinary natural language processing,or use much more complex rules to sample the code.The former processing method is easy to cause the loss of key information of the code,while the latter can make the algorithm to be too complicated,and the training of the model will take a lot of time.Alon proposed an algorithm named Code2vec,which has significant advantages compared with previous code analysis methods.But the Code2vec still has some limitations.Therefore,a code embedding method based on neural network is proposed.The main idea of this method is to express the code function as the code embedding vector.First,a code function is decomposed into a series of abstract syntax tree paths,then a neural network is used to learn how to represent each path,and finally all paths are aggregated into an embedding vector to represent the current code function.A prototype system based on this method is implemented in this paper.Experimental results show that compared with Code2vec,the new algorithm has the advantages of simpler structure and faster training speed.

Key words: Neural network, Code embedding, Code analysis, Abstract syntax tree, Code classification

CLC Number: 

  • TP311
[1]ZHANG D,LUO P.Survey of code similarity detection methods and tools[J].Computer Science,2020,47(3):5-10.
[2]CHEN Q Y,LI S P,YAN M,et al.Code clone detection:A literature review[J].Journal of Software,2019,30(4):962-980.
[3]ALON U,ZILBERSTEIN M,LEVY O,et al.Code2vec:learning distributed representations of code[J].Proceedings of the Programming Languages,2019,3(POPL):1-29.
[4]SHI Z C,ZHOU Y.Method of Code Features Automated Ex-traction[J].Journal of Frontiers of Computer Science and Technology,2021,15(3):456-467.
[5]KAMIYA T,KUSUMOTO S,INOUE K.CCFinder:a multilinguistic token-based code clone detection system for large scale source code[J].IEEE Transactions on Software Engineering,2002,28(7):654-670.
[6]SAJNANI H,SAINI V,SVAJLENKO J,et al.SourcererCC:scaling code clone detection to big-code[C]//Proceedings of the 38th International Conference on Software Engineering.2016:1157-1168.
[7]JIANG L,MISHERGHI G,SU Z,et al.DECKARD:scalable and accurate tree-based detection of code clones[C]//International Conference on Software Engineering.IEEE,2006.
[8]ZHOU Y,YAN X,YANG W,et al.Augmenting Java method comments generation with context information based on neural networks[J].The Journal of Systems and Software,2019,156(Oct.):328-340.
[9]YAN X,ZHOU Y,HUANG Z Q.Code snippets recommendation based on sequence to sequence model[J].Journal of Frontiers of Computer Science and Technology,2020,14(5):731-739.
[10]HU X,LI G,XIA X,et al.Deep code comment generation[C]//Proceedings of the 26th Conference on Program Comprehension.2018:200-210.
[11]WHITE M,TUFANO M,VENDOME C,et al.Deep learningcode fragments for code clone detection[C]//Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.2016:87-98.
[12]WAN Y,ZHAO Z,YANG M,et al.Improving automaticsource code summarization via deep reinforcement learning[C]//Proceedings of the 33rd IEEE/ACM International Conference on Automated Software Engineering.2018:397-407.
[13]MOU L L,LI G,ZHANG L,et al.Convolutional neural net-works over tree structures for programming language processing[C]//Proceedings of the 30th AAAI Conference on Artificial Intelligence.2016:1287-1293.
[14]ALLAMANIS M,PENG H,SUTTON C.Aconvolutional at-tention network for extreme summarization of source code[C]//Proceedings of the 33nd International Conference on Machine Learning.2016:2091-2100.
[15]IYER S,KONSTAS I,CHEUNG A,et al.Summarizing source code using a neural attention model[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.2016.
[16]XU X J,LIU C,QIAN F,et al.Neural network-based graphembedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:363-376.
[17]XIONG H,YAN H H,GUO T,et al.Code similarity detection:a survey[J].Computer Science,2010,37(8):9-14,76.
[18]DONALDSON J L,LANCASTER A M,SPOSATO P H.Aplagiarism detection system[C]//ACM SIGCSE Bulletin.1981:21-25.
[19]ENGELS S,LAKSHMANAN V,CRAIG M.Plagiarism detection using feature-based neural networks[C]//ACM SIGCSE Bulletin.2007:34-38.
[20]RUBINSTEIN R.The cross-entropy method for combinatorial and continuous optimization[J].Methodology and Computing in Applied Probability,1999,1(2):127-190.
[21]ALON U,ZILBERSTEIN M,LEVY O,et al.A general path-based representation forpredicting program properties[C]//Proceedings of the 39th ACM SIGPLAN Conference.ACM,2018.
[22]KINGMA D P,BA J.Adam:a method for stochastic optimization[J].arXiv:1412.6980,2017.
[23]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].Journal of Machine Learning Research,2014,15(1):1929-1958.
[24]BENGIO Y,GLOROT X.Understanding the difficulty of trai-ning deep feed forward neural networks[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics.2010:249-256.
[1] LI Fan, JIA Dongli, YAO Yumin, TU Jun. Graph Neural Network Few Shot Image Classification Network Based on Residual and Self-attention Mechanism [J]. Computer Science, 2023, 50(6A): 220500104-5.
[2] LUO Huilan, LONG Jun, LIANG Miaomiao. Attentional Feature Fusion Approach for Siamese Network Based Object Tracking [J]. Computer Science, 2023, 50(6A): 220300237-9.
[3] LI Han, HOU Shoulu, TONG Qiang, CHEN Tongtong, YANG Qimin, LIU Xiulei. Entity Relation Extraction Method in Weapon Field Based on DCNN and GLU [J]. Computer Science, 2023, 50(6A): 220200112-7.
[4] XIONG Haojie, WEI Yi. Study on Multibeam Sonar Elevation Data Prediction Based on Improved CNN-BP [J]. Computer Science, 2023, 50(6A): 220100161-4.
[5] WANG Xiya, ZHANG Ning, CHENG Xin. Review on Methods and Applications of Text Fine-grained Emotion Recognition [J]. Computer Science, 2023, 50(6A): 220900137-7.
[6] WANG Tao, GUO Wushi, DENG Jian, CHEN Liang. Building Natural Language Interfaces for Distributed SCADA Systems Using Semantic Parsing [J]. Computer Science, 2023, 50(6A): 220300141-9.
[7] HUANG Yujiao, CHEN Mingkai, ZHENG Yuan, FAN Xinggang, XIAO Jie, LONG Haixia. Text Classification Based on Weakened Graph Convolutional Networks [J]. Computer Science, 2023, 50(6A): 220700039-5.
[8] LUO Ruiqi, YAN Jinlin, HU Xinrong, DING Lei. EEG Emotion Recognition Based on Multiple Directed Weighted Graph and ConvolutionalNeural Network [J]. Computer Science, 2023, 50(6A): 220600128-8.
[9] XU Changqian, WANG Dong, SU Feng, ZHANG Jun, BIAN Haifeng, LI Long. Image Recognition Method of Transmission Line Safety Risk Assessment Based on MultidimensionalData Coupling [J]. Computer Science, 2023, 50(6A): 220500032-6.
[10] WANG Jinjin, CHENG Yinhui, NIE Xin, LIU Zheng. Fast Calculation Method of High-altitude Electromagnetic Pulse Environment Based on Machine Learning [J]. Computer Science, 2023, 50(6A): 220500046-5.
[11] LIU Zhe, LIANG Yudong, LI Jiaying. Adaptive Image Dehazing Algorithm Based on Dynamic Convolution Kernels [J]. Computer Science, 2023, 50(6): 200-208.
[12] WANG Jinwei, ZENG Kehui, ZHANG Jiawei, LUO Xiangyang, MA Bin. GAN-generated Face Detection Based on Space-Frequency Convolutional Neural Network [J]. Computer Science, 2023, 50(6): 216-224.
[13] GU Shouke, CHEN Wen. Function Level Code Vulnerability Detection Method of Graph Neural Network Based on Extended AST [J]. Computer Science, 2023, 50(6): 283-290.
[14] LI Huilai, YANG Bin, YU Xiuli, TANG Xiaomei. Explainable Comparison of Software Defect Prediction Models [J]. Computer Science, 2023, 50(5): 21-30.
[15] WANG Huiyan, YU Minghe, YU Ge. Deep Learning-based Heterogeneous Information Network Representation:A Survey [J]. Computer Science, 2023, 50(5): 103-114.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!