Computer Science ›› 2020, Vol. 47 ›› Issue (9): 1-9.doi: 10.11896/jsjkx.191200170

• Computer Software • Previous Articles     Next Articles

CodeSearcher:Code Query Using Functional Descriptions in Natural Languages

LU Long-long, CHEN Tong, PAN Min-xue, ZHANG Tian   

  1. State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China
  • Received:2019-12-30 Published:2020-09-10
  • About author:LU Long-long,born in 1993,master.His main research interests include software verification and model checking.
    PAN Min-xue,born in 1983,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include software modelling & verification,software analysis & testing,mobile computing and intelligent software engineering.
  • Supported by:
    National Natural Science Foundation of China (61972193) and Fundamental Research Funds for the Central Universities (14380022,14380020).

Abstract: When a developer is required to implement a function,but not knowing how to implement this function using a specific programming language,he/she usually needs to perform code query using natural language.It is time-consuming and labor-intensive to perform code query while programming.There have been bunch of code query tools proposed over the past years to assist developers,while most of the approaches require complex inputs or have low precision.We propose a new code query approach called CodeSearcher based on natural language description.Relying on the 〈natural language description,code snippet〉 data pairs extracted from Stack OverFlow,which is a software development related Q&A website,we design a neural network model and the corresponding training method to map “natural language description” and “code snippets” to the same vector space.CodeSearcher is different from the conventional code query systems.On the one hand,it accepts all kinds of user-provided code bases for searching,because the system only relies on the source codes without depending on the comments or description of the source codes;on the other hand,it no longer limits the form of code query process to “entering the natural language description and feeding back the code snippets”,but extends a code Q&A section,helping the users pick the appropriate code snippet by the characteristic key words,so that developers do not have to read all returned code snippets in detail.The experimental results show that CodeSearcher has high precision compared with the baseline.

Key words: Code query, Natural language processing, Stack OverFlow

CLC Number: 

  • TP391
[1] JANICE S,LETHBRIDGE T,VINSON N,et al.An examina-tion of software engineering work practices[C]//CASCON First Decade High Impact Papers.2010:174-188.
[2] KUMAR S.How to convert string to xml file in java[EB/OL].[2019-06-24].https://stackoverflow.com/questions/3888033/how-to-convert-string-to-xml-file-in-java.
[3] KRAMER D.API documentation from source code comments:a case study of Javadoc[C]//Proceedings of the 17th Annual International Conference on Computer Documentation.ACM,1999:147-153.
[4] SPOLSKY J,ATWOOD J.Stack OverFlow Users [EB/OL].[2019-06-25].http://stackexchange.com/leagues/1/week/stackoverflow.
[5] SPOLSKY J,ATWOOD J.Stack Exchange Data Explorer[EB/OL].[2019-06-25].https://data.stackexchange.com/.
[6] SACHDEV S,LI H,LUAN S,et al.Retrieval on source code:a neural code search[C]//Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages.ACM,2018:31-41.
[7] WILLETT P.The Porter stemming algorithm:then and now[J].Program,2006,40(3):219-223.
[8] ZHANG Z,LYONS M,SCHUSTER M,et al.Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron[C]//Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.IEEE,1998:454-459.
[9] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105.
[10] GU X,ZHANG H,KIM S.Deep code search[C]//2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).IEEE,2018:933-944.
[11] SMITH N,VAN BRUGGEN D,TOMASSETTI F.JavaParser[OL].[2019-06-24].https://github.com/javaparser/javapar-ser.
[12] ERICH G.Design patterns:elements of reusable object-oriented software[M].Pearson Education India,1995.
[13] CUTTING D.Lucene[OL].[2019-08-17].https://lucene.apche.org.
[14] CHOLLET F.Keras[OL].[2019-06-24].https://keras.io.
[15] HINTON G,DEAN J.TensorFlow[OL].[2019-06-24].https://www.tensorflow.org/.
[16] LI X,WANG Z,WANG Q,et al.Relationship-aware codesearch for JavaScript frameworks[C]//ACM SIGSOFT International Symposium on Foundations of Software Engineering.ACM,2016:690-701.
[17] LV F,ZHANG H,LOU J,et al.Codehow:Effective code search based on api understanding and extended boolean model (e)[C]//2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).IEEE,2015:260-270.
[18] YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.ACM,2014:689-699.
[19] SUBRAMANIAN S,INOZEMTSEVA L,HOLMES R.LiveAPI documentation[C]//Proceedings of the 36th International Conference on Software Engineering.ACM,2014:643-652.
[20] MORENO L,BAVOTA G,DI PENTA M,et al.How can I use this method?[C]//Proceedings of the 37th International Conference on Software Engineering-Volume 1.IEEE Press,2015:880-890.
[21] STOLEE K T,ELBAUM S,DOBOS D.Solving the search for source code[J].ACM Transactions on Software Engineering and Methodology (TOSEM),2014,23(3):26.
[22] LEMOS O A L,BAJRACHARYA S,OSSHER J,et al.A test-driven approach to code search and its application to the reuse of auxiliary functionality[J].Information and Software Technology,2011,53(4):294-306.
[23] INOUE K,SASAKI Y,XIA P,et al.Where does this code come from and where does it go?-integrated code history tracker for open source systems[C]//Proceedings of the 34th International Conference on Software Engineering.IEEE Press,2012:331-341.
[24] LINSTEAD E,BAJRACHARYA S,NGO T,et al.Sourcerer:mining and searching internet-scale software repositories[J].Data Mining and Knowledge Discovery,2009,18(2):300-336.
[25] MCMILLAN C,GRECHANIK M,POSHYVANYK D,et al.Portfolio:finding relevant functions and their usage[C]//Proceedings of the 33rd International Conference on Software Engineering.ACM,2011:111-120.
[26] LU M,SUN X,WANG S,et al.Query expansion via wordnet for effective code search[C]//2015 IEEE 22nd International Conference on Software Analysis,Evolution,and Reengineering (SANER).IEEE,2015:545-549.
[27] GEORGE A M.WordNet:a lexical database for English[J].Communications of the ACM,1995,38:39-41.
[1] TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.
[2] TIAN Ye, SHOU Li-dan, CHEN Ke, LUO Xin-yuan, CHEN Gang. Natural Language Interface for Databases with Content-based Table Column Embeddings [J]. Computer Science, 2020, 47(9): 60-66.
[3] ZHANG Ying, ZHANG Yi-fei, WANG Zhong-qing and WANG Hong-ling. Automatic Summarization Method Based on Primary and Secondary Relation Feature [J]. Computer Science, 2020, 47(6A): 6-11.
[4] ZHANG Hao-yang and ZHOU Liang. Application of Improved GHSOM Algorithm in Civil Aviation Regulation Knowledge Map Construction [J]. Computer Science, 2020, 47(6A): 429-435.
[5] WU Xiao-kun, ZHAO Tian-fang. Application of Natural Language Processing in Social Communication:A Review and Future Perspectives [J]. Computer Science, 2020, 47(6): 184-193.
[6] HU Chao-wen, YANG Ya-lian, WU Chang-xing. Survey of Implicit Discourse Relation Recognition Based on Deep Learning [J]. Computer Science, 2020, 47(4): 157-163.
[7] YU Shan-shan, SU Jin-dian, LI Peng-fei. Sentiment Classification Method for Sentences via Self-attention [J]. Computer Science, 2020, 47(4): 204-210.
[8] LI Zhou-jun,FAN Yu,WU Xian-jie. Survey of Natural Language Processing Pre-training Techniques [J]. Computer Science, 2020, 47(3): 162-173.
[9] MIAO Yi, ZHAO Zeng-shun, YANG Yu-lu, XU Ning, YANG Hao-ran, SUN Qian. Survey of Image Captioning Methods [J]. Computer Science, 2020, 47(12): 149-160.
[10] HUO Dan, ZHANG Sheng-jie, WAN Lu-jun. Context-based Emotional Word Vector Hybrid Model [J]. Computer Science, 2020, 47(11A): 28-34.
[11] XU Sheng, ZHU Yong-xin. Study on Question Processing Algorithms in Visual Question Answering [J]. Computer Science, 2020, 47(11): 226-230.
[12] LI Zhou-jun,WANG Chang-bao. Survey on Deep-learning-based Machine Reading Comprehension [J]. Computer Science, 2019, 46(7): 7-12.
[13] ZHANG Shuai, FU Xiang-ling, HOU Yi. Prediction Model of P2P Trading Volume Based on Investor Sentiment [J]. Computer Science, 2019, 46(6A): 60-65.
[14] SUN Bao-hua, HU Nan, LI Dong-yang. Analysis Research of Software Requirement Safety Based on Neural Network and NLP [J]. Computer Science, 2019, 46(6A): 348-352.
[15] ZHOU Ming,JIA Yan-ming,ZHOU Cai-lan,XU Ning. English Automated Essay Scoring Methods Based on Discourse Structure [J]. Computer Science, 2019, 46(3): 234-241.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[2] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[3] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[4] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[5] YANG Yu-qi, ZHANG Guo-an and JIN Xi-long. Dual-cluster-head Routing Protocol Based on Vehicle Density in VANETs[J]. Computer Science, 2018, 45(4): 126 -130 .
[6] SHI Chao, XIE Zai-peng, LIU Han and LV Xin. Optimization of Container Deployment Strategy Based on Stable Matching[J]. Computer Science, 2018, 45(4): 131 -136 .
[7] HAN Kui-kui, XIE Zai-peng and LV Xin. Fog Computing Task Scheduling Strategy Based on Improved Genetic Algorithm[J]. Computer Science, 2018, 45(4): 137 -142 .
[8] PANG Bo, JIN Qian-kun, HENIGULI·Wu Mai Er and QI Xing-bin. Routing Scheme Based on Network Slicing and ILP Model in SDN[J]. Computer Science, 2018, 45(4): 143 -147 .
[9] ZHENG Xiu-lin, SONG Hai-yan and FU Yi-peng. Distinguishing Attack of MORUS-1280-128[J]. Computer Science, 2018, 45(4): 152 -156 .
[10] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree[J]. Computer Science, 2018, 45(4): 157 -162 .