Computer Science ›› 2019, Vol. 46 ›› Issue (11): 137-144.doi: 10.11896/jsjkx.191100501C

• Software & Database Technology • Previous Articles     Next Articles

Empirical Study of Code Query Technique Based on Constraint Solving on StackOverflow

CHEN Zheng-zhao, JIANG Ren-he, PAN Min-xue, ZHANG Tian, LI Xuan-dong   

  1. (State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210093,China)
  • Received:2018-10-07 Online:2019-11-15 Published:2019-11-14

Abstract: Code query plays an important role in code reuse,and the Q&A about code on StackOverflow which is a professionalquestion-and-answer site for programmers is a typical scenario for code reuse.In practice,the manual way is adopted to answer questions,which usually has the disadvantages of poor real-time,incorrect description of problems,and low availability of answers.If the process of code query and search can be automated and replace manual answering, it will save a lot of manpower and time cost.Now there are already many code query technologies,but most lack experie-nce of application in the real case.Based on the ideas of Satsy,this paper implemented the code query technology based on constraint solving for Java language,and designed the empirical study.This paper used StackOverflow as the research object,and mainly studied how to apply the code query technology based on constraint solving of Q&A about code on the website.First of all,the problems on the website are analyzed,and 35 problems with high trafficin Java language are extracted as query problems.Then,about 30000 lines of code are captured from GitHub,and they are converted into the form of constraints as well as built as a large code base to support code query.Finally,through the analysis of the query results of these 35 questions,the practical application effect of the technology on StackOverflow was evalua-ted.The results show that the proposed technology has good practical application effect on the specific questions and code scale studied,and can replace the manual answer on a considerable scale.

Key words: Code query, Constraint solving, Opensource code database, Empirical study

CLC Number: 

  • TP311.5
[1] ZENG Z,ZHAO J H.Code Query Technology Based on Program Analysis[J].Computer Science,2012,39(2):143-147.(in Chinese)曾锃,赵建华.基于程序分析的代码查询技术[J].计算机科学,2012,39(2):143-147.
[2] GRECHANIK M,FU C,XIE Q,et al.Exemplar:EXEcutableexaMPLesARchive[C]∥Acm/ieee International Conference on Software Engineering.IEEE,2010.
[3] MCMILLAN C,GRECHANIK M,POSHYVANYK D,et al.Exemplar:A Source Code Search Engine for Finding Highly Relevant Applications[J].IEEE Transactions on Software Engineering,2012,38(5):1069-1087.
[4] LEMOS O A L,BAJRACHARYA S K,OSSHER J,et al.CodeGenie:using test-cases to search and reuse source code[C]∥Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering.ACM,2007:525-526.
[5] SUSHIL B,TRUNG N,ERIK L,et al.Sourcerer:a search engine for open source code supporting structure-based search[C]∥In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems,Languages,and Applications(OOPSLA ’06).New York,2006:681-682.
[6] LINSTEAD E,BAJRACHARYA S,NGO T,et al.Sourcerer:mining and searching internet-scale software repositories[J].Data Mining and Knowledge Discovery,2009,18(2):300-336.
[7] STOLEE K T,ELBAUM S,DOBOS D.Solving the Search for Source Code[J].ACM Transactions on Software Engineering and Methodology,2014,23(3):1-45.
[8] Stack Overflow.About us[OL].
[9] GitHub.GitHub is how people build software[OL].
[10] Wikipedia.CFG:control flow graph[OL].
[11] VALLÉE-RAI R,CO P,GAGNON E,et al.Soot-a Java bytecode optimization framework[C]∥Conference of the Centre for Advanced Studies on Collaborative Research.IBM Press,1999:214-224.
[12] Wikipedia.JSON[OL].
[13] Open Hub.Koders[OL].
[14] KIM K,KIM D,BISSYANDE T F,et al.FaCoY-A Code-toCode Search Engine[C]∥International Conference on Software Engineering.IEEE Computer Society,2018:946-957.
[15] Kent Beck.Test-Driven Development by Example[M].Boston,United States:Addison-Wesley Professional,2002.
[1] LU Long-long, CHEN Tong, PAN Min-xue, ZHANG Tian. CodeSearcher:Code Query Using Functional Descriptions in Natural Languages [J]. Computer Science, 2020, 47(9): 1-9.
[2] YE Zhi-bin,YAN Bo. Survey of Symbolic Execution [J]. Computer Science, 2018, 45(6A): 28-35.
[3] ZHANG Yong-gang, CHENG Zhu-yuan. Research Progress on Max Restricted Path Consistency Constraint Propagation Algorithms [J]. Computer Science, 2018, 45(6A): 41-45.
[4] LI Hang, ZANG Lie, GAN Lu. Search of Speculative Symbolic Execution Path Based on Ant Colony Algorithm [J]. Computer Science, 2018, 45(6): 145-150.
[5] JIANG Ren-he, ZHENG Xiao-mei, ZHU Xiao-qian, PAN Min-xue and ZHANG Tian. Method of Java Code Repository Construction Based on UML Relationship [J]. Computer Science, 2017, 44(11): 69-79.
[6] ZHANG Kai, SUN Xiao-bing, PENG Xin and ZHAO Wen-yun. Empirical Study of Reopened Security Bugs on Mozilla [J]. Computer Science, 2017, 44(11): 41-49.
[7] CHEN Xiang,GU Qing,CHEN Dao-xu and JIANG Zheng-zheng. Systematic Review of Test Suite Minimization for Regression Testing [J]. Computer Science, 2014, 41(9): 196-204.
[8] . Symbolic Execution Based on Branch Confusion Algorithm [J]. Computer Science, 2012, 39(9): 115-119.
[9] ZENG Zeng ,ZHAO Jian-hua. Code Query Technology Based on Program Analysis [J]. Computer Science, 2012, 39(2): 148-153.
[10] SUN Li-juan,JIN Ying-hao. Strategy of Feature Interaction Based on the Sufficiency Principle [J]. Computer Science, 2010, 37(8): 270-272.
[11] YANG Yang,ZHANG Huan-guo,WANG Hou-zhen. Full-automatic Detection of Memory Safety Violations for C Programs [J]. Computer Science, 2010, 37(6): 155-158.
[12] . [J]. Computer Science, 2007, 34(5): 208-209.
Full text



[1] LEI Li-hui and WANG Jing. Parallelization of LTL Model Checking Based on Possibility Measure[J]. Computer Science, 2018, 45(4): 71 -75 .
[2] SUN Qi, JIN Yan, HE Kun and XU Ling-xuan. Hybrid Evolutionary Algorithm for Solving Mixed Capacitated General Routing Problem[J]. Computer Science, 2018, 45(4): 76 -82 .
[3] ZHANG Jia-nan and XIAO Ming-yu. Approximation Algorithm for Weighted Mixed Domination Problem[J]. Computer Science, 2018, 45(4): 83 -88 .
[4] WU Jian-hui, HUANG Zhong-xiang, LI Wu, WU Jian-hui, PENG Xin and ZHANG Sheng. Robustness Optimization of Sequence Decision in Urban Road Construction[J]. Computer Science, 2018, 45(4): 89 -93 .
[5] SHI Wen-jun, WU Ji-gang and LUO Yu-chun. Fast and Efficient Scheduling Algorithms for Mobile Cloud Offloading[J]. Computer Science, 2018, 45(4): 94 -99 .
[6] ZHOU Yan-ping and YE Qiao-lin. L1-norm Distance Based Least Squares Twin Support Vector Machine[J]. Computer Science, 2018, 45(4): 100 -105 .
[7] LIU Bo-yi, TANG Xiang-yan and CHENG Jie-ren. Recognition Method for Corn Borer Based on Templates Matching in Muliple Growth Periods[J]. Computer Science, 2018, 45(4): 106 -111 .
[8] GENG Hai-jun, SHI Xin-gang, WANG Zhi-liang, YIN Xia and YIN Shao-ping. Energy-efficient Intra-domain Routing Algorithm Based on Directed Acyclic Graph[J]. Computer Science, 2018, 45(4): 112 -116 .
[9] CUI Qiong, LI Jian-hua, WANG Hong and NAN Ming-li. Resilience Analysis Model of Networked Command Information System Based on Node Repairability[J]. Computer Science, 2018, 45(4): 117 -121 .
[10] WANG Zhen-chao, HOU Huan-huan and LIAN Rui. Path Optimization Scheme for Restraining Degree of Disorder in CMT[J]. Computer Science, 2018, 45(4): 122 -125 .