Computer Science ›› 2019, Vol. 46 ›› Issue (10): 141-147.doi: 10.11896/jsjkx.180801573

• Information Security • Previous Articles     Next Articles

Clone Detection Algorithm for Binary Executable Code with Suffix Tree

ZHANG Ling-hao1, GUI Sheng-lin2,3, MU Feng-jun2, WANG Sheng1   

  1. (State Grid Sichuan Electric Power Research Institute,Chengdu 610000,China)1
    (School of Computer Science and Engineering,University of Electronic Science and Technology of China,Chengdu 611731,China)2
    (The 30th Institute of China Electronics Technology Group Corporation,Chengdu 610041,China)3
  • Received:2018-08-25 Revised:2019-01-08 Online:2019-10-15 Published:2019-10-21

Abstract: How to detect code clones is an important issue in software maintenance and software infringements.Clone detection techniques based on source code tend to fail in the infringement disputes of commercial software due to trade secret.Therefore,in the scenario when the source code is unavailable for detection,this paper presented a clone detection algorithm based on binary executable file analysis.Firstly,instruction type sequences of binary executable files are obtained by decompilation instruction type abstraction,then a suffix tree is constructed based on these instruction type sequences.The clone pairs among functions can be figured out based on this suffix tree.In addition,this paper eliminated gravel instructions for enhancing performance.At last,based on MIPS32 instruction set,this paper used respectively Linux kernel and obfuscated test code as samples on clone level 0-level 2 and level 1-level 4 to compare with the source code detection tools.Test results show that even in the scenario where the source code is lacking,this algorithm can also perform fine-grained clone analysis and has high detection performance for code clones at all levels.

Key words: Binary executable file, Code cone, Performance optimization, Suffix tree

CLC Number: 

  • TP311.5
[1]United States Court of Appeals for the Federal Circuit:JACOBSENV.KATZER [A/OL].(2018-03-27)[2018-07-07].http://www.cafc.uscourts.gov/sites/default/files/opinions-orders/08-1001.pdf.
[2]LV H.Computer software copyright infringement judge[D]. Lanzhou:Lanzhou University,2015.(in Chinese)
吕浩.计算机软件著作权侵权的判定研究[D].兰州:兰州大学,2015.
[3]DONG H,TANG Q.Review of Jacobsen V.Katzer Case:The Influence of the Differences Between Chinese and American Copyright Systems on the Nature of Open Source Agreements and Its Enlightenment[J].China Copyright,2009,2009(3):48-51.(in Chinese)
董皓,唐磬.Jacobsen V.Katzer案评述:中美版权制度差异对开源协议性质的影响及启示[J].中国版权,2009,2009(3):48-51.
[4]SU Q,WU W M,LI Z L,et al.Research and Application of Chaos Opaque Predicate in Code Obfuscation[J].Computer Science,2013,40(6):155-159.(in Chinese)
苏庆,吴伟民,李忠良,李景樑,陈为德.混沌不透明谓词在代码混淆中的研究与应用[J].计算机科学,2013,40(6):155-159.
[5]ROY,CHANCHAL K,CORDY,et al.Comparison and evaluation of code clone detection techniques and tools:A qualitative approach[J].Science of Computer Programming,2009,74(7):470-495.
[6]RATTAN D,BHATIA R,SINGH M.Software clone detection:A systematic review[J].Information & Software Technology,2013,55(7):1165-1199.
[7]SU X H,ZHANG F L.A Survey for Management Oriented Code Clone Research[J].Chinese Journal of Computers,2018,41(3):628-651.(in Chinese)
苏小红,张凡龙.面向管理的克隆代码研究综述[J].计算机学报,2018,41(3):628-651.
[8]CHEN H,GUO T,CUI B J,et al.Comparison and analysis on binary file similarity detection technique[C]//Proceedings of Conference on Vulnerability Analysis and Risk.Beijing:China Information Technology Security Evalution Center,2011.
[9]JOHNSON J H.Substring matching for clone detection and change tracking[C]//Proceedings of International Conference on Software Maintenance,1994.New York:IEEE,1994:120-126.
[10]KAMIYA T,KUSUMOTO S,INOUE K.CCFinder:A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code[J].IEEE Transactions on Software Engineering,2002,28(7):654-670.
[11]SAINI V,SAJNANI H,KIM J,et al.SourcererCC and SourcererCC-I:tools to detect clones in batch mode and during software development[C]// Proceedings of the 38th Proceedings of International Conference on Software Engineering Companion,2016.New York:IEEE,2016:597-600.
[12]SUDHAMANI M,RANGARAJAN L.Code clone detection based on order and content of control statements[C]//Procee-dings of International Conference on Contemporary Computing &Informatics.New York:IEEE,2017:59-64.
[13]WANG M,WANG P,XU Y.CCSharp:An Efficient Three-Phase Code Clone Detector Using Modified PDGs[C]//Procee-dings of Asia-pacific Software Engineering Conference.New York:IEEE,2018:100-109.
[14]ROY C K,CORDY J R.NICAD:Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization[C]//Proceedings of IEEE International Confe-rence on Program Comprehension.Washington DC:IEEE Computer Society,2008:172-181.
[15]WHITE M,TUFANO M,VENDOME C,et al.Deep learning code fragments for code clone detection[C]//Proceedings of IEEE/ACM International Conference on Automated Software Engineering.NewYork:IEEE,2016:87-98.
[16]RAGKHITWETSAGUL C,KRINKE J,MARNETTE B.A picture is worth a thousand words:Code clone detection based on image similarity[C]//Proceedings of International Workshop on Software Clones.NewYork:IEEE,2018:44-50.
[17]SÆBJØRNSEN A,WILLCOCK J,PANAS T,et al.Detecting code clones in binary executables[C]//Proceedings of Eighteenth International Symposium on Software Testing and Ana-lysis.New York:ACM,2009:117-128.
[18]DONG M,ZHUANG H,ZHANG R,et al.A New Method of Software Clone Detection Based on Binary Instruction Structure Analysis[C]//Proceedings of International Conference on Wireless Communications.NewYork:IEEE,2013:1-4.
[19]HU Y K,ZHANG Y Y,LI J R,et al.Binary code clone detection across architectures and compiling configurations[C]//Proceedings of International Conference on Program Comprehension.New York:IEEE,2017:88-98.
[20]TIS Committee.Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification[S/OL].http://refspecs.linuxbase.org/elf/elf.pdf.
[21]MIPS32® Instruction Set Quick Reference [S/OL].https://www.mips.com/?do-download=mips32-instruction-set-quick-reference-v1-01.
[22]MCCREIGHT E M.A Space-Economical Suffix Tree Construction Algorithm[J].Journal of the Acm,1976,23(2):262-272.
[23]UKKONEN E.On-line construction of suffix trees[J].Algorithmica,1995,14(3):249-260.
[24]rswier c4:C in four functions[CP/OL].(2017-08-22) [2018-08-22].https://github.com/rswier/c4.git.
[25]C/C++ Obfuscator[EB/OL].(2018-08-09)[2018-08-09]http://stunnix.com/prod/cxxo/.
[26]UDDIN M S,ROY C K,SCHNEIDER K A.SimCad:An extensible and faster clone detection tool for large scale software systems[C]//Proceedings of International Conference on Program Comprehension.NewYork:IEEE,2013:236-238.
[1] CHEN Jun-wu, YU Hua-shan. Strategies for Improving Δ-stepping Algorithm on Scale-free Graphs [J]. Computer Science, 2022, 49(6A): 594-600.
[2] CHEN Le, GAO Ling, REN Jie, DANG Xin, WANG Yi-hao, CAO Rui, ZHENG Jie, WANG Hai. Adaptive Bitrate Streaming for Energy-Efficiency Mobile Augmented Reality [J]. Computer Science, 2022, 49(1): 194-203.
[3] E Hai-hong, ZHANG Tian-yu, SONG Mei-na. Web-based Data Visualization Chart Rendering Optimization Method [J]. Computer Science, 2021, 48(3): 119-123.
[4] ZHANG Xiao, ZHANG Si-meng, SHI Jia, DONG Cong, LI Zhan-huai. Review on Performance Optimization of Ceph Distributed Storage System [J]. Computer Science, 2021, 48(2): 1-12.
[5] XU Jiang-feng and TAN Yu-long. Research on HBase Configuration Parameter Optimization Based on Machine Learning [J]. Computer Science, 2020, 47(6A): 474-479.
[6] ZHANG Peng-yi, SONG Jie. Research Advance on Efficiency Optimization of Blockchain Consensus Algorithms [J]. Computer Science, 2020, 47(12): 296-303.
[7] XU Chuan-fu,WANG Xi,LIU Shu,CHEN Shi-zhao,LIN Yu. Large-scale High-performance Lattice Boltzmann Multi-phase Flow Simulations Based on Python [J]. Computer Science, 2020, 47(1): 17-23.
[8] XU Qi-ze, HAN Wen-ting, CHEN Jun-shi, AN Hong. Optimization of Breadth-first Search Algorithm Based on Many-core Platform [J]. Computer Science, 2019, 46(1): 314-319.
[9] SUN Tao, ZHANG Jun-xing. Review of SDN Performance Optimization Technology [J]. Computer Science, 2018, 45(11A): 84-91.
[10] ZOU Ling-jun, CHEN Ling and DAI Cai-yan. Detecting Community from Bipartite Network Based on Generalized Suffix Tree [J]. Computer Science, 2017, 44(7): 221-226.
[11] SUN Zhi-long, Edwin H-M Sha, ZHUGE Qing-feng, CHEN Xian-zhang and WU Kai-jie. Research on Data Consistency for In-memory File Systems [J]. Computer Science, 2017, 44(2): 222-227.
[12] NI You-cong, LI Song, YE Peng and DU Xin. Random Search Rule Based Performance Evolutionary Optimization Method at Software Architecture Level [J]. Computer Science, 2017, 44(11): 156-163.
[13] ZHAO Li-wei, CHEN Xian-zhang and ZHUGE Qing-feng. Performance Comparison of Join Operations on SIMFS and EXT4 [J]. Computer Science, 2016, 43(6): 184-187.
[14] KE Ye-qing, MA Zhi-rou, WU Hai-jiang and LIU Jie. SmartHR:A Resume Query and Management System Based on Semantic Web [J]. Computer Science, 2015, 42(12): 56-59.
[15] DU Xin, WANG Chun-yan, NI You-cong, YE Peng and XIAO Ru-liang. Rule-based Performance Optimization Model at Software Architecture Level [J]. Computer Science, 2015, 42(10): 189-192.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!