Computer Science ›› 2020, Vol. 47 ›› Issue (9): 1-9.doi: 10.11896/jsjkx.191200170

• Computer Software • Previous Articles     Next Articles

CodeSearcher:Code Query Using Functional Descriptions in Natural Languages

LU Long-long, CHEN Tong, PAN Min-xue, ZHANG Tian   

  1. State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China
  • Received:2019-12-30 Published:2020-09-10
  • About author:LU Long-long,born in 1993,master.His main research interests include software verification and model checking.
    PAN Min-xue,born in 1983,Ph.D,associate professor,is a member of China Computer Federation.His main research interests include software modelling & verification,software analysis & testing,mobile computing and intelligent software engineering.
  • Supported by:
    National Natural Science Foundation of China (61972193) and Fundamental Research Funds for the Central Universities (14380022,14380020).

Abstract: When a developer is required to implement a function,but not knowing how to implement this function using a specific programming language,he/she usually needs to perform code query using natural language.It is time-consuming and labor-intensive to perform code query while programming.There have been bunch of code query tools proposed over the past years to assist developers,while most of the approaches require complex inputs or have low precision.We propose a new code query approach called CodeSearcher based on natural language description.Relying on the 〈natural language description,code snippet〉 data pairs extracted from Stack OverFlow,which is a software development related Q&A website,we design a neural network model and the corresponding training method to map “natural language description” and “code snippets” to the same vector space.CodeSearcher is different from the conventional code query systems.On the one hand,it accepts all kinds of user-provided code bases for searching,because the system only relies on the source codes without depending on the comments or description of the source codes;on the other hand,it no longer limits the form of code query process to “entering the natural language description and feeding back the code snippets”,but extends a code Q&A section,helping the users pick the appropriate code snippet by the characteristic key words,so that developers do not have to read all returned code snippets in detail.The experimental results show that CodeSearcher has high precision compared with the baseline.

Key words: Code query, Natural language processing, Stack OverFlow

CLC Number: 

  • TP391
[1] JANICE S,LETHBRIDGE T,VINSON N,et al.An examina-tion of software engineering work practices[C]//CASCON First Decade High Impact Papers.2010:174-188.
[2] KUMAR S.How to convert string to xml file in java[EB/OL].[2019-06-24].https://stackoverflow.com/questions/3888033/how-to-convert-string-to-xml-file-in-java.
[3] KRAMER D.API documentation from source code comments:a case study of Javadoc[C]//Proceedings of the 17th Annual International Conference on Computer Documentation.ACM,1999:147-153.
[4] SPOLSKY J,ATWOOD J.Stack OverFlow Users [EB/OL].[2019-06-25].http://stackexchange.com/leagues/1/week/stackoverflow.
[5] SPOLSKY J,ATWOOD J.Stack Exchange Data Explorer[EB/OL].[2019-06-25].https://data.stackexchange.com/.
[6] SACHDEV S,LI H,LUAN S,et al.Retrieval on source code:a neural code search[C]//Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages.ACM,2018:31-41.
[7] WILLETT P.The Porter stemming algorithm:then and now[J].Program,2006,40(3):219-223.
[8] ZHANG Z,LYONS M,SCHUSTER M,et al.Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron[C]//Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.IEEE,1998:454-459.
[9] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenetclassification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems.2012:1097-1105.
[10] GU X,ZHANG H,KIM S.Deep code search[C]//2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).IEEE,2018:933-944.
[11] SMITH N,VAN BRUGGEN D,TOMASSETTI F.JavaParser[OL].[2019-06-24].https://github.com/javaparser/javapar-ser.
[12] ERICH G.Design patterns:elements of reusable object-oriented software[M].Pearson Education India,1995.
[13] CUTTING D.Lucene[OL].[2019-08-17].https://lucene.apche.org.
[14] CHOLLET F.Keras[OL].[2019-06-24].https://keras.io.
[15] HINTON G,DEAN J.TensorFlow[OL].[2019-06-24].https://www.tensorflow.org/.
[16] LI X,WANG Z,WANG Q,et al.Relationship-aware codesearch for JavaScript frameworks[C]//ACM SIGSOFT International Symposium on Foundations of Software Engineering.ACM,2016:690-701.
[17] LV F,ZHANG H,LOU J,et al.Codehow:Effective code search based on api understanding and extended boolean model (e)[C]//2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).IEEE,2015:260-270.
[18] YE X,BUNESCU R,LIU C.Learning to rank relevant files for bug reports using domain knowledge[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.ACM,2014:689-699.
[19] SUBRAMANIAN S,INOZEMTSEVA L,HOLMES R.LiveAPI documentation[C]//Proceedings of the 36th International Conference on Software Engineering.ACM,2014:643-652.
[20] MORENO L,BAVOTA G,DI PENTA M,et al.How can I use this method?[C]//Proceedings of the 37th International Conference on Software Engineering-Volume 1.IEEE Press,2015:880-890.
[21] STOLEE K T,ELBAUM S,DOBOS D.Solving the search for source code[J].ACM Transactions on Software Engineering and Methodology (TOSEM),2014,23(3):26.
[22] LEMOS O A L,BAJRACHARYA S,OSSHER J,et al.A test-driven approach to code search and its application to the reuse of auxiliary functionality[J].Information and Software Technology,2011,53(4):294-306.
[23] INOUE K,SASAKI Y,XIA P,et al.Where does this code come from and where does it go?-integrated code history tracker for open source systems[C]//Proceedings of the 34th International Conference on Software Engineering.IEEE Press,2012:331-341.
[24] LINSTEAD E,BAJRACHARYA S,NGO T,et al.Sourcerer:mining and searching internet-scale software repositories[J].Data Mining and Knowledge Discovery,2009,18(2):300-336.
[25] MCMILLAN C,GRECHANIK M,POSHYVANYK D,et al.Portfolio:finding relevant functions and their usage[C]//Proceedings of the 33rd International Conference on Software Engineering.ACM,2011:111-120.
[26] LU M,SUN X,WANG S,et al.Query expansion via wordnet for effective code search[C]//2015 IEEE 22nd International Conference on Software Analysis,Evolution,and Reengineering (SANER).IEEE,2015:545-549.
[27] GEORGE A M.WordNet:a lexical database for English[J].Communications of the ACM,1995,38:39-41.
[1] YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236.
[2] HOU Yu-tao, ABULIZI Abudukelimu, ABUDUKELIMU Halidanmu. Advances in Chinese Pre-training Models [J]. Computer Science, 2022, 49(7): 148-163.
[3] LI Xiao-wei, SHU Hui, GUANG Yan, ZHAI Yi, YANG Zi-ji. Survey of the Application of Natural Language Processing for Resume Analysis [J]. Computer Science, 2022, 49(6A): 66-73.
[4] ZHANG Hu, BAI Ping. Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification [J]. Computer Science, 2022, 49(2): 279-284.
[5] CHEN Zhi-yi, SUI Jie. DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection [J]. Computer Science, 2022, 49(1): 101-107.
[6] WANG Li-mei, ZHU Xu-guang, WANG De-jia, ZHANG Yong, XING Chun-xiao. Study on Judicial Data Classification Method Based on Natural Language Processing Technologies [J]. Computer Science, 2021, 48(8): 80-85.
[7] LIU Yun-han, SHA Chao-feng, NIU Jun-yu. Analysis of Topics on Database Systems in Stack Overflow [J]. Computer Science, 2021, 48(6): 48-56.
[8] WU Yu, LI Zhou-jun. Survey on Retrieval-based Chatbots [J]. Computer Science, 2021, 48(12): 278-285.
[9] TONG Xin, WANG Bin-jun, WANG Run-zheng, PAN Xiao-qin. Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing [J]. Computer Science, 2021, 48(1): 258-267.
[10] TIAN Ye, SHOU Li-dan, CHEN Ke, LUO Xin-yuan, CHEN Gang. Natural Language Interface for Databases with Content-based Table Column Embeddings [J]. Computer Science, 2020, 47(9): 60-66.
[11] ZHANG Ying, ZHANG Yi-fei, WANG Zhong-qing and WANG Hong-ling. Automatic Summarization Method Based on Primary and Secondary Relation Feature [J]. Computer Science, 2020, 47(6A): 6-11.
[12] ZHANG Hao-yang and ZHOU Liang. Application of Improved GHSOM Algorithm in Civil Aviation Regulation Knowledge Map Construction [J]. Computer Science, 2020, 47(6A): 429-435.
[13] WU Xiao-kun, ZHAO Tian-fang. Application of Natural Language Processing in Social Communication:A Review and Future Perspectives [J]. Computer Science, 2020, 47(6): 184-193.
[14] HU Chao-wen, YANG Ya-lian, WU Chang-xing. Survey of Implicit Discourse Relation Recognition Based on Deep Learning [J]. Computer Science, 2020, 47(4): 157-163.
[15] YU Shan-shan, SU Jin-dian, LI Peng-fei. Sentiment Classification Method for Sentences via Self-attention [J]. Computer Science, 2020, 47(4): 204-210.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!