计算机科学 ›› 2025, Vol. 52 ›› Issue (7): 37-49.doi: 10.11896/jsjkx.240600076
王东煜, 莫然, 詹文静, 蒋颖婕
WANG Dongyu, MO Ran, ZHAN Wenjing, JIANG Yingjie
摘要: Github Copilot是Github和OpenAI推出的一款基于生成式AI的代码自动生成工具,它的核心功能之一,是根据自然语言的描述生成对应的实现代码。这一AI在编程领域的拓展,近年来引起了热议与重视。现阶段人们的关注点主要在AI编程与人类编程的对比,比如AI程序员与人类程序员的编程效率对比,以及两者所编写的代码性能对比。然而,目前关于Copilot代码本身特征的研究较少,特别是代码质量问题,例如AI生成代码有哪些缺陷,这些缺陷是否会导致程序错误,以及代码是否易于理解等。代码质量对软件开发至关重要,分析AI生成代码的代码质量有助于更好地使用和改进此类代码生成工具。本文使用工具从LeetCode中提取所有的开源问题(共2033道)作为数据样本对Copilot进行测试,分别生成3种语言(Java,JavaScript,Python)的代码建议,提交并记录代码建议的执行结果。使用SonarQube静态分析这些代码建议文件,结合代码建议的执行结果,从可靠性、可维护性、复杂性3个维度分析Copilot的代码质量特征。结果发现:1) Copilot生成代码较为可靠,对于Java,JavaScript和Python 3种语言,分别收集了7,5,9种Bug类型,且3种语言涉及Bug的代码建议比例不超过3%,但涉及Bug的代码建议50%以上未通过测试;2) Copilot代码建议可维护性较差,对于Java,JavaScript和Python,分别收集了47,23,20种代码异味类类型,3种语言涉及代码建议的比例均超过40%,涉及代码异味的代码建议50%以上未通过测试用例;3) Copilot代码易于理解,多数代码建议的复杂度未超过阈值,且复杂度异常的代码建议比例不超过6%。最后,结合实验结果,提出了维护Copilot的可行建议,并探讨了此类工具未来可能的研究方向。
中图分类号:
[1]ORTIN F,ESCALADA J,RODRIGUEZ-PRIETO O.Big Code:New Opportunities for Improving Software Construction[J].Journal of Software,2016,11(11):1083-1088. [2]ALLAMANIS M,BARR E T,DEVANBU P,et al.A survey ofmachine learning for big code and naturalness[J].ACM Computing Surveys(CSUR),2018,51(4):1-37. [3]LUAN S,YANG D,BARNABY C,et al.Aroma:Code recommendation via structural code search[C]//Proceedings of the ACM on Programming Languages.2019:1-28. [4]NGUYEN T,VU P,NGUYEN T.Code recommendation for exception handling[C]//Proceedings of the 28th ACM Joint Mee-ting on European Software Engineering Conference and Sympo-sium on the Foundations of Software Engineering.2020:1027-1038. [5]Github.Research:quantifying GitHub Copilot's impact on developer productivity and happiness.[EB/OL].https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/. [6]BACCHELLI A,BIRD C.Expectations,outcomes,and challenges of modern code review[C]//2013 35th International Conference on Software Engineering(ICSE).San Francisco,CA,USA,2013:712-721. [7]LeetCode.The World's Leading Online Programming Platform[EB/OL].https://leetcode.com/. [8]SonarQube.Code Quality Tool[EB/OL].https://www.sonarsource.com-/products/sonarqube/. [9]STAMELOS I,ANGELIS L,OIKONOMOU A,et al.Codequality analysis in open source software development[J].Information Systems Journal,2002,12(1):43-60. [10]Github Copilot.Your AI Programmer[EB/OL].https://git-hub.com/features/copilot. [11]OpenAI CodeX.An AI System Translating Natural Language to Code[EB/OL].https://openai.com/blog/openai-codexGit-hub. [12]Copilot.What is GitHub Copilot?[EB/OL].https://docs.git-hub.com/en/copilot/overview-of-github-copilot/about-github-copilot-for-individuals. [13]NGUYEN N,NADI S.An empirical evaluation of GitHub copilot's code suggestions[C]//Proceedings of the 19th Interna-tional Conference on Mining Software Repositories.2022:1-5. [14]LeetCode.Palindrome number[EB/OL].https://leetcode.com/problems/palindrome-number/. [15]SonarQube.SonarQube Severity Issues[EB/OL].https://www.sonarsource.com/blog/we-are-adjusting-rules-severities/. [16]SonarQube.Sonar Rules[EB/OL].https://rules.sonarsource.com/. [17]HELMUTH T,KELLY P.PSB2:the second program synthesisbenchmark suite[C]//Proceedings of the Genetic and Evolutio-nary Computation Conference.2021:785-794. [18]HELMUTH T,SPECTOR L.General program synthesis be-nchmark suite[C]//Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation.2015:1039-1046. [19]SANTOS J A M,ROCHA-JUNIOR J B,PRATES LC L,et al.A systematic review on the code smell effect[J].Journal of Systems and Software,2018,144:450-477. [20]Homeland Security Systems Engineering and Development Institute.Common Weakness Enumeration[EB/OL].https://cwe.mitre.org/index.html. [21]OWASP Foundation.OWASP Top Ten 2017[EB/OL].https://owasp.org/www-project-top-ten/2017/. [22]Ranga Karanam.Code Quality Basics-What Is Code Duplication?[EB/OL].https://www.springboottutorial.com/code-quality-what-is-code-duplication. [23]DAKHEL A M,MAJDINASAB V,NIKANJAMA,et al.Github copilot ai pair programmer:Asset or liability?[J].Journal of Systems and Software,2023,203:111734. [24]LEISERSON C E,RIVEST R L,CORMEN T H,et al.Introduction to algorithms[M].Cambridge,MA,USA:MIT press,1994. [25]IMAI S.Is github copilot a substitute for human pair-programming? an empirical study[C]//Proceedings of the ACM/IEEE 44th International Conference on Software Engineering:Companion Proceedings.2022:319-321. [26]SOBANIA D,BRIESCH M,ROTHLAUF F.Choose your programming copilot:a comparison of the program synthesis performance of github copilot and genetic programming[C]//Proceedings of the Genetic and Evolutionary Computation Confe-rence.2022:1019-1027. [27]PEARCE H,AHMAD B,TAN B,et al.Asleep at the keyboard? assessing the security of github copilot's code contributions[C]//2022 IEEE Symposium on Security and Privacy(SP).IEEE,2022:754-768. [28]MASTROPAOLO A,PASCARELLA L,GUGLIELMIE,et al.On the robustness of code generation techniques:An empirical study on github copilot[J].arXiv:2302.00438,2023. [29]KARLSSON S,FARAH M,HASSAN F.Evaluating large language models' capability to generate algorithmic code using prompt engineering[EB/OL].(2024-07-09)[2024-07-21].https://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-24285. [30]DE BARROS G R C,MARTINS MO.Comparative of source code generated by principal LLM generators for Python and Lua languages[J/OL].https://tfgonline.lapinf.ufn.edu.br/media/midias/TFGII_Gustavo_2024.pdf. [31]YOUNES Y,NASSRALLAHT.Enhancing Software Mainte-nance with Large Language Models:A comprehensive study[J/OL].(2024-06-25)[2024-07-21]. https://www.diva-portal.org/smash/get/diva2:1868472/FULLTEXT01.pdf. |
|