Computer Science ›› 2020, Vol. 47 ›› Issue (3): 5-10.doi: 10.11896/jsjkx.190500148

Special Issue: Intelligent Software Engineering

• Intelligent Software Engineering • Previous Articles     Next Articles

Survey of Code Similarity Detection Methods and Tools

ZHANG Dan,LUO Ping   

  1. (School of Software, Tsinghua University, Beijing 100084, China)
    (Key Laboratory of Information System Security(Tsinghua University), Ministry of Education, Beijing 100084, China)
  • Received:2019-05-27 Online:2020-03-15 Published:2020-03-30
  • About author:ZHANG Dan,master.Her main research interests include information security and software analysis. LUO Ping,born in 1959,Ph.D,professor.His main research interests include information security and code detection.
  • Supported by:
    This work was supported by National Key R&D Program of China (2018YFF0215901).

Abstract: Source code opening has become a new trend in the information technology field.While code cloning improves code quality and reduces software development cost to some extent,it also affects the stability,robustness and maintainability of a software system.Therefore,code similarity detection plays an important role in the development of computer and information security.To overcome the various hazards brought by code cloning,many code similarity detection methods and corresponding tools have been developed by academic and industrial circles.According to the manner of processing source code,these detection methodscould be roughly divided into five categories:text analysis based,lexical analysis based,grammar analysis based,semantics analysis based and metrics based.These detection tools can provide good detection performance in many application scenarios,but are also facing a series of challenges brought by ever-increasing data in this big data era.This paper firstly introduced code cloning problem andmade a detailed comparison between code similarity detection methods divided into five categories.Then,it classified and organized currently available code similarity detection tools.Finally,it comprehensively evaluated the detection performance of detection tools based on various evaluation criteria.Furthermore,the future research direction of code similarity detection was prospected.

Key words: Clone detection, Clone evaluation, Code clone

CLC Number: 

  • TP311
