Computer Science ›› 2025, Vol. 52 ›› Issue (7): 37-49.doi: 10.11896/jsjkx.240600076

• Computer Software • Previous Articles     Next Articles

Analysis of the Code Quality of Code Automatic Generation Tool Github Copilot

WANG Dongyu, MO Ran, ZHAN Wenjing, JIANG Yingjie   

  1. School of Computer Science, Central China Normal University, Wuhan 430079, China
  • Received:2024-06-11 Revised:2024-09-21 Published:2025-07-17
  • About author:WANG Dongyu,born in 2000,postgra-duate,is a member of CCF(No.Q5469G).His main research interests include code generation model and so on.
    MO Ran,born in 1989,Ph.D,professor.His main research interests include software architecture analysis,software data mining,software defect analysis,and intelligent software engineering.
  • Supported by:
    Key Programs of the Interdisciplinary Research Platform at Central China Normal University(CCNU24JCPT015).

Abstract: Github Copilot is a generative AI-based code auto-generation tool launched by Github and OpenAI in 2022.One of its core functions is to generate corresponding implementation code based on natural language annotations describing functions.This expansion of AI in the field of programming has attracted heated discussion and attention in recent years.At this stage,people's focus is mainly on the comparison between AI programming and human programming,such as the comparison of programming efficiency and code performance between AI programmers and human programmers.However,there is currently limited research on the characteristics of Copilot-generated code itself,particularly regarding code quality issues,such as defects in the AI-generated code,whether these defects might lead to program errors,and the understandability of the code.Code quality directly determines the life and durability of a software project.Analyzing and summarizing its code quality characteristics helps to better use and improve such AI code tools.This paper utilizes tools to extract all open-source problems from LeetCode(2,033 in total) as data samples to test Copilot,generating code suggestions in three programming languages(Java,JavaScript,and Python),submitting them,and recording the execution results of the generated code.By statically analyzing the code suggestions with SonarQube and integrating their execution results,this paper evaluates Copilot's code quality in terms of reliability,maintainability,and complexity.The results reveal that:1)Copilot-generated code is relatively reliable.For Java,JavaScript,and Python,7,5,and 9 types of bugs are identified respectively.The proportion of code suggestions involving bugs do not exceed 3% across all three languages,but over 50% of bug-related code suggestions fail test cases.2)Copilot's code suggestions exhibit poor maintainability.For Java,JavaScript,and Python,47,23,and 20 types of code smells are detected respectively.Over 40% of code suggestions in all three languages contain code smells,and more than 50% of smell-related suggestions failetest cases.3)Copilot-generated code is easy to understand.The complexity of most code suggestions do not exceed predefined thresholds,with less than 6% of suggestions flag for excessive complexity.Finally,based on the experimental findings,practical recommendations for improving Copilot are proposed,and potential future research directions for such tools are discussed.

Key words: Automatic code generation, Code quality, Code reliability, Code maintainability, Code complexity

CLC Number: 

  • TP391
[1]ORTIN F,ESCALADA J,RODRIGUEZ-PRIETO O.Big Code:New Opportunities for Improving Software Construction[J].Journal of Software,2016,11(11):1083-1088.
[2]ALLAMANIS M,BARR E T,DEVANBU P,et al.A survey ofmachine learning for big code and naturalness[J].ACM Computing Surveys(CSUR),2018,51(4):1-37.
[3]LUAN S,YANG D,BARNABY C,et al.Aroma:Code recommendation via structural code search[C]//Proceedings of the ACM on Programming Languages.2019:1-28.
[4]NGUYEN T,VU P,NGUYEN T.Code recommendation for exception handling[C]//Proceedings of the 28th ACM Joint Mee-ting on European Software Engineering Conference and Sympo-sium on the Foundations of Software Engineering.2020:1027-1038.
[5]Github.Research:quantifying GitHub Copilot's impact on developer productivity and happiness.[EB/OL].https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/.
[6]BACCHELLI A,BIRD C.Expectations,outcomes,and challenges of modern code review[C]//2013 35th International Conference on Software Engineering(ICSE).San Francisco,CA,USA,2013:712-721.
[7]LeetCode.The World's Leading Online Programming Platform[EB/OL].https://leetcode.com/.
[8]SonarQube.Code Quality Tool[EB/OL].https://www.sonarsource.com-/products/sonarqube/.
[9]STAMELOS I,ANGELIS L,OIKONOMOU A,et al.Codequality analysis in open source software development[J].Information Systems Journal,2002,12(1):43-60.
[10]Github Copilot.Your AI Programmer[EB/OL].https://git-hub.com/features/copilot.
[11]OpenAI CodeX.An AI System Translating Natural Language to Code[EB/OL].https://openai.com/blog/openai-codexGit-hub.
[12]Copilot.What is GitHub Copilot?[EB/OL].https://docs.git-hub.com/en/copilot/overview-of-github-copilot/about-github-copilot-for-individuals.
[13]NGUYEN N,NADI S.An empirical evaluation of GitHub copilot's code suggestions[C]//Proceedings of the 19th Interna-tional Conference on Mining Software Repositories.2022:1-5.
[14]LeetCode.Palindrome number[EB/OL].https://leetcode.com/problems/palindrome-number/.
[15]SonarQube.SonarQube Severity Issues[EB/OL].https://www.sonarsource.com/blog/we-are-adjusting-rules-severities/.
[16]SonarQube.Sonar Rules[EB/OL].https://rules.sonarsource.com/.
[17]HELMUTH T,KELLY P.PSB2:the second program synthesisbenchmark suite[C]//Proceedings of the Genetic and Evolutio-nary Computation Conference.2021:785-794.
[18]HELMUTH T,SPECTOR L.General program synthesis be-nchmark suite[C]//Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation.2015:1039-1046.
[19]SANTOS J A M,ROCHA-JUNIOR J B,PRATES LC L,et al.A systematic review on the code smell effect[J].Journal of Systems and Software,2018,144:450-477.
[20]Homeland Security Systems Engineering and Development Institute.Common Weakness Enumeration[EB/OL].https://cwe.mitre.org/index.html.
[21]OWASP Foundation.OWASP Top Ten 2017[EB/OL].https://owasp.org/www-project-top-ten/2017/.
[22]Ranga Karanam.Code Quality Basics-What Is Code Duplication?[EB/OL].https://www.springboottutorial.com/code-quality-what-is-code-duplication.
[23]DAKHEL A M,MAJDINASAB V,NIKANJAMA,et al.Github copilot ai pair programmer:Asset or liability?[J].Journal of Systems and Software,2023,203:111734.
[24]LEISERSON C E,RIVEST R L,CORMEN T H,et al.Introduction to algorithms[M].Cambridge,MA,USA:MIT press,1994.
[25]IMAI S.Is github copilot a substitute for human pair-programming? an empirical study[C]//Proceedings of the ACM/IEEE 44th International Conference on Software Engineering:Companion Proceedings.2022:319-321.
[26]SOBANIA D,BRIESCH M,ROTHLAUF F.Choose your programming copilot:a comparison of the program synthesis performance of github copilot and genetic programming[C]//Proceedings of the Genetic and Evolutionary Computation Confe-rence.2022:1019-1027.
[27]PEARCE H,AHMAD B,TAN B,et al.Asleep at the keyboard? assessing the security of github copilot's code contributions[C]//2022 IEEE Symposium on Security and Privacy(SP).IEEE,2022:754-768.
[28]MASTROPAOLO A,PASCARELLA L,GUGLIELMIE,et al.On the robustness of code generation techniques:An empirical study on github copilot[J].arXiv:2302.00438,2023.
[29]KARLSSON S,FARAH M,HASSAN F.Evaluating large language models' capability to generate algorithmic code using prompt engineering[EB/OL].(2024-07-09)[2024-07-21].https://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-24285.
[30]DE BARROS G R C,MARTINS MO.Comparative of source code generated by principal LLM generators for Python and Lua languages[J/OL].https://tfgonline.lapinf.ufn.edu.br/media/midias/TFGII_Gustavo_2024.pdf.
[31]YOUNES Y,NASSRALLAHT.Enhancing Software Mainte-nance with Large Language Models:A comprehensive study[J/OL].(2024-06-25)[2024-07-21]. https://www.diva-portal.org/smash/get/diva2:1868472/FULLTEXT01.pdf.
[1] WANG Weiwei, LE Yang, WANG Yankai. Sub-community Detection and Evaluation in Open Source Projects:An Example of Apache IoTDB [J]. Computer Science, 2025, 52(7): 26-36.
[2] LING Shixiang, YANG Zhibin, ZHOU Yong. Integrated Avionics Software Code Automatic Generation Method for ARINC653 Operating System [J]. Computer Science, 2024, 51(7): 10-21.
[3] JIANG Yanjie, DONG Chunhao, LIU Hui. Nonsense Variable Names Detection Method Based on Lexical Features and Data Mining [J]. Computer Science, 2024, 51(6): 23-33.
[4] ZHU Jian, HU Kai, WANG Jun, LI Jie, YE Yafei, SHI Xiyan. Reliable Smart Contract Automatic Generation Based on Event-B [J]. Computer Science, 2023, 50(10): 343-349.
[5] XU Hai-yan,JIANG Ying. Code Quality Recognition and Analysis Based on User’s Comments [J]. Computer Science, 2020, 47(3): 41-47.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!