计算机科学 ›› 2024, Vol. 51 ›› Issue (6): 12-22.doi: 10.11896/jsjkx.230400117
刘春玲, 戚旭衍, 唐永鹤, 孙雪凯, 李晴浩, 张雨
LIU Chunling, QI Xuyan, TANG Yonghe, SUN Xuekai, LI Qinghao, ZHANG Yu
摘要: 代码克隆指在软件开发过程中对源代码复用、修改、重构产生的文本相似或结构相似的代码。代码克隆对提升软件开发效率、节约开发成本有积极作用,但也会引起Bug传播,并对软件的稳定性、可维护性产生负面影响。代码克隆检测在剽窃检测、漏洞检测、版权侵权等领域具有重要的研究意义和应用价值。基于词汇的克隆检测技术能快速检测1-3型克隆,能扩展到其他编程语言,已被广泛应用于大规模克隆检测任务中。文中对近5年基于词汇的克隆检测技术的研究现状进行了梳理,根据相似性算法中的基本计算粒度将其分为4类,并对10余个技术特征进行了分析和总结,讨论其局限性及面临的挑战,最后结合新技术的发展提出了基于词汇的克隆检测技术未来可能的研究方向。
中图分类号:
[1]JUERGENS E,DEISSENBOECK F,HUMMEL B,et al.Docode clones matter?[C]//2009 IEEE 31st International Conference on Software Engineering.IEEE,2009:485-495. [2]SHENEAMER A,KALITA J.A survey of software clone de-tection techniques[J].International Journal of Computer Applications,2016,137(10):1-21. [3]ISLAM J F,MONDAL M,ROY C K.Bug replication in code clones:An empirical study[C]//2016 IEEE 23rd International Conference on Software Analysis,Evolution,and Reengineering(SANER).IEEE,2016,1:68-78. [4]MONDAL M,ROY B,ROY C K,et al.An empirical study on bug propagation through code cloning[J].Journal of Systems and Software,2019,158:110407 [5]MONDAL M,ROY B,ROY C K,et al.Investigating contextadaptation bugs in code clones[C]//2019 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2019:157-168. [6]MONDAL M,ROY C K,SCHNEIDER K A.A Summary on the Stability of Code Clones and Current Research Trends[M]//Code Clone Analysis:Research,Tools,and Practices.2021:169-180. [7]MONDAL M,ROY C K,SCHNEIDER K A.A fifine-grainedanalysis on the inconsistent changes in code clones[C]//2020 IEEE International Conferenceon Software Maintenance and Evolution(ICSME).IEEE,2020:220-231. [8]KIM S,WOO S,LEE H,et al.Vuddy:A scalable approach for vulnerable code clone discovery[C]//2017 IEEE Symposium on Security and Privacy(SP).IEEE,2017:595-614. [9]BELLON S,KOSCHKE R,ANTONIOL G,et al.Comparisonand evaluation of clone detection tools[J].IEEE Transactions on Software Engineering,2007,33(9):577-591. [10]SVAJLENKO J,ROY C K.Bigcloneeval:A clone detection tool evaluation framework with bigclonebench[C]//2016 IEEE International Conference on Software Maintenance End evolution(ICSME).IEEE,2016:596-600. [11]WANG P,SVAJLENKO J,WU Y,et al.CCAligner:a token based large-gap clone detector[C]//Proceedings of the 40th International Conference on Software Engineering.2018:1066-1077. [12]WU M,WANG P,YIN K,et al.Lvmapper:A large-varianceclone detector using sequencing alignment approach[J].IEEE Access,2020,8:27986-27997. [13]KAMIYA T,KUSUMOTO S,INOUE K.CCFinder:A multilinguistic token-based code clone detection system for large scale source code[J].IEEE Transactions on Software Engineering,2002,28(7):654-670. [14]SAJNANI H,SAINI V,SVAJLENKO J,et al.Sourcerercc:Scaling code clone detection to big-code[C]//Proceedings of the 38th International Conference on Software Engineering.2016:1157-1168. [15]JANG J,AGRAWAL A,BRUMLEY D.ReDeBug:finding unpatched code clones in entire os distributions[C]//2012 IEEE Symposium on Security and Privacy.IEEE,2012:48-62. [16]RATTAN D,BHATIA R,SINGH M.Software clone detection:A systematic review[J].Information and Software Technology,2013,55(7):1165-1199. [17]ZHANG H,SAKURAI K.A survey of software clone detection from security perspective[J].IEEE Access,2021,9:48157-48173. [18]CHEN Q Y,LI S P,YAN M,et al.Code Clone Detection:A Li-terature Review[J].Journal of Software,2019,30(4):962-980. [19]ROY C K,CORDY J R.A survey on software clone detection research[J].Queen’s School of Computing TR,2007,541(115):64-68. [20]AIN Q U,BUTT W H,ANWAR M W,et al.A systematic review on code clone detection[J].IEEE Access,2019,7:86121-86144. [21]MIN H,LI PING Z.Survey on software clone detection research[C]//Proceedings of the 2019 3rd International Conference on Management Engineering,Software Engineering and Service Sciences.2019:9-16. [22]WALKER A,CERNY T,SONG E.Open-source tools andbenchmarks for code-clone detection:past,present,and future trends[J].ACM SIGAPP Applied Computing Review,2020,19(4):28-39. [23]KAUR A,SHARMA S,SAINI M.Code clone detection usingmachine learning techniques:A systematic literature review[J].International Journal of Open Source Software and Processes(IJOSSP),2020,11(2):49-75. [24]LEI M,LI H,LI J,et al.Deep learning application on code clone detection:A review of current knowledge[J].Journal of Systems and Software,2022,184:111141. [25]SEMURA Y,YOSHIDA N,CHOI E,et al.CCFinderSW:Clone detection tool with flexible multilingual tokenization[C]//24th Asia-Pacific Software Engineering Conference(APSEC 2017).IEEE,2017:654-659. [26]NAKAGAWA T,HIGO Y,KUSUMOTO S.NIl:large-scale detection of large-variance clones[C]//Proceedings of the 29th ACM Joint Meeting on European Software Engineering Confe-rence and Symposium on the Foundations of Software Enginee-ring.2021:830-841. [27]LI Z,LU S,MYAGMAR S,et al.CP-Miner:Finding copy-paste and related bugs in large-scale software code[J].IEEE Transactions on software Engineering,2006,32(3):176-192. [28]LI L,FENG H,ZHUANG W,et al.CClearner:A deep learning-based clone detection approach[C]//IEEE International Confe-rence on Software Maintenance and Evolution(ICSME 2017).IEEE,2017:249-260. [29]YUKI Y,HIGO Y,KUSUMOTO S.A technique to detectmulti-grained code clones[C]//2017 IEEE 11th International Workshop on Software Clones(IWSC).IEEE,2017:1-7. [30]AKRAM J,QI L,LUO P.VCIPR:vulnerable code is identifiable when a patch is released(hacker’s perspective)[C]//2th IEEE Conference on Software Testing,Validation and Verification(ICST 2019 ).IEEE,2019:402-413. [31]LI G,WU Y,ROY C K,et al.SAGA:efficient and large-scale detection of near-miss clones with GPU acceleration[C]//2020 IEEE 27th International Conference on Software Analysis,Evolution and Reengineering(SANER).IEEE,2020:272-283. [32]SVAJLENKO J,ROY C K.Fast and flexible large-scale clonedetection with CloneWorks[C]//ICSE(Companion Volume).2017:27-30. [33]NISHI M A,DAMEVSKI K.Scalable code clone detection and search based on adaptive prefix filtering[J].Journal of Systems and Software,2018,137:130-142. [34]GOLUBEV Y,POLETANSKY V,POVAROV N,et al.Multi-threshold token-based code clone detection[C]//2021 IEEE International Conference on Software Analysis,Evolution and Reengineering(SANER).IEEE,2021:496-500. [35]ZHU W,YOSHIDA N,KAMIYA T,et al.MSCCD:grammarpluggable clone detection based on ANTLR parser generation[C]//Proceedings of the 30th IEEE/ACM International Confe-rence on Program Comprehension.2022:460-470. [36]WANG W,DENG Z,XUE Y,et al.Ccstokener:Fast yet accurate code clone detection with semantic token[J].Journal of Systems and Software,2023,199:111618. [37]SVAJLENKO J,ISLAM J F,KEIVANLOO I,et al.Towards a big data curated benchmark of inter-project code clones[C]//2014 IEEE International Conference on Software Maintenance and Evolution.IEEE,2014:476-480. [38]ROY C K,CORDY J R.Amutation/injection-based automaticframework for evaluating code clone detection tools[C]//2009 International Conference on Software Testing,Verification,and Validation Workshops.IEEE,2009:157-166. [39]ISHIHARA T,HOTTA K,HIGO Y,et al.Inter-project functional clone detection toward building libraries-an empirical study on 13 000 projects[C]//2012 19th Working Conference on Reverse Engineering.IEEE,2012:387-391. [40]ROY C K,CORDY J R.NICAD:Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization[C]//2008 16th IEEE International Conference on Program Comprehension.IEEE,2008:172-181. [41]JIANG L,MISHERGHI G,SU Z,et al.Deckard:Scalable and accurate tree-based detection of code clones[C]//29th International Conference on Software Engineering(ICSE’07).IEEE,2007:96-105. [42]WAN Y,ZHAO W,ZHANG H,et al.What do they capture? a structural analysis of pre-trained language models for source code[C]//Proceedings of the 44th International Conference on Software Engineering.2022:2377-2388. [43]LI Z,ZOU D,XU S,et al.SySeVR:A framework for using deep learning to detect software vulnerabilities[J].IEEE Transactions on Dependable and Secure Computing,2021,19(4):2244-2258. [44]RUSSELL R,KIM L,HAMILTON L,et al.Automated vulnera-bility detection in source code using deep representation lear-ning[C]//2018 17th IEEE International Conference on Machine Learning and Applications(ICMLA).IEEE,2018:757-762. [45]ISLAM M R,ZIBRAN M F,NAGPAL A.Security vulnerabilities in categories of clones and non-cloned code:An empirical study[C]//2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement(ESEM).IEEE,2017:20-29. [46]YUE R,MENG N,WANG Q.A characterization study of re-peated bug fixes[C]//2017 IEEE International Conference on Software Maintenance and Evolution(ICSME).IEEE,2017:422-432. [47]MONDAL M,ROY C K,SCHNEIDER K A.Bug-proneness and late propagation tendency of code clones:A comparative study on different clone types[J].Journal of Systems and Software,2018,144:41-59. [48]ZHU C,TANG Y,WANG Q,et al.Enhancing code similarityanalysis for effective vulnerability detection[C]//Proceedings of the 2nd International Conference on Computer Science and Software Engineering.2019:153-158. |
|