Computer Science ›› 2025, Vol. 52 ›› Issue (6): 365-380.doi: 10.11896/jsjkx.240400003
• Information Security • Previous Articles Next Articles
WEI Youyuan1, SONG Jianhua1,3,4, ZHANG Yan2,3
CLC Number:
[1]SUN X J,WEI Q,WANG Y S,et al.Survey of code similarity detection technology[J].Journal of Computer Applications,2024,44(4):1248-1258. [2]NVD.CVE-2023-20892[EB/OL].(2023-06-22) [2024-01-20].https://nvd.nist.gov/vuln/detail/CVE-2023-20892/. [3]ROY C K,CORDY J R,KOSCHKE R.Comparison and evaluation of code clone detection techniques and tools:A qualitative approach[J].Science of Computer Programming,2009,74(7):470-495. [4]UL HAQ I,CABALLERO J.A Survey of Binary Code Similarity[J].ACM Computing Surveys,2021,54(3):1-38. [5]XIA B,PANG J M,ZHOU X,et al.Research progress on binarycode similarity search[J].Journal of Computer Applications,2022,42(4):985-998. [6]ZHOU Z J,DONG R C,JIANG J H,et al.Survey on Binary Code Security Techniques[J].Computer Systems and Applications,2023,32(1):1-11. [7]FANG L,WU Z H,WEI Q.Summary of Binary Code Similarity Detection Techniques[J].Computer Science,2021,48(5):1-8. [8]LI Z,ZOU D Q,XU S H,et al.SySeVR:A Framework for Using Deep Learning to Detect Software Vulnerabilities[J].IEEE Transactions on Dependable and Secure Computing,2022,19(4):2244-2258. [9]XIE C L,LIANG Y,WANG X.Survey of Deep Learning Applied in Code Representation[J].Computer Engineering and Applications,2021,57(20):53-63. [10]BELLON S,KOSCHKE R,ANTONIOl G,et al.Comparisonand evaluation of clone detection tools[J].IEEE Transactions on Software Engineering,2007,33(9):577-591. [11]CHEN Q Y,LI S P,YAN M,et al.Code Clone Detection:A Li-terature Review[J].Journal of Software,2019,30(4):962-980. [12]LE Q Y,LIU J X,SUN X P,et al.Survey of Research Progress of Code Clone Detection[J].Computer Science,2021,48(S2):509-522. [13]WHALE G.Plague:Plagiarism Detection Using Program Structure[D].Sydeny:University of New South Wales,1988. [14]MCCREIGHT E M.A Space-Economical Suffix Tree Construction Algorithm[J].Journal of the ACM,1976,23(2):262-272. [15]UKKONEN E.On-line construction of suffix trees[J].Algorithmica,1995,14(3):249-260. [16]DAVID Y,PARTUSH N,YAHAV E.Similarity of binariesthrough re-optimization[C]//The 38th ACMSIGPLAN Conference on Programming Language Design and Implementation.ACM,2017:79-94. [17]DAVID Y,PARTUSH N,YAHAV E.Statistical similarity ofbinaries[C]//The 37th ACM SIGPLAN Conference on Programming Language Design and Implementation.PLDI,2016:266-280. [18]NETHERCOTE N,SEWARD J.Valgrind:a framework forheavyweight dynamic binary instrumentation[C]//The 28th ACM SIGPLAN Conference on Programming Language Design and Implementation.PLDI,2007:89-100. [19]ZHANG L H,GUI S L,MU F J,et al.Clone Detection Algorithm for Binary Executable Code with Suffix Tree[J].Compu-ter Science,2019,46(10):141-147. [20]ROY C K,CORDY J R.NICAD:Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization[C]//2008 16th IEEE International Conference on Program Comprehension.IEEE,2008:172-181. [21]XIONG M,XUE Y X,XU Y.A binary code similarity analysis method based on code embedding[J].Cyber Security And Data Governance,2023,42(3):58-67. [22]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[C]//International Conference on Learning Representations.ICLR,2013. [23]LE Q,MIKOLOV T.Distributed representations of sentencesand documents[C]//The 31st International Conference on Machine Learning.PMLR,2014:1188-1196. [24]ZUO F,LI X,YOUNG P,et al.Neural machine translation inspired binary code similarity comparison beyond function pairs[C]//Network and Distributed Systems Security Symposium.NDSS,2019. [25]MASSARELLI L,GIUSEPPE A D L,PETRONI F,et al.Safe:Self-attentive function embeddings for binary similarity[C]//International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment.Cham:Springer,2019:309-329. [26]MASSARELLI L,GIUSEPPE A D L,PETRONI F,et al.Investigating graph embedding neural networks with unsupervised features extraction for binary analysis[C]//The 2nd Workshop on Binary Analysis Research.BAR,2019. [27]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//The 26th International Conference on Neural Information Processing Systems.NIPS,2013:3111-3119. [28]LIN Z,FENG M,NOGUEIRA DOS SANTOS C,et al.A structured self-attentive sentence embedding[C]//International Conference on Learning Representations.ICLR,2017. [29]BROMLEY J,GUYON I,LECUN Y,et al.Signature verification using a “Siamese” time delay neural network[C]//The 6th International Conference on Neural Information Processing Systems.NIPS,1994:737-744. [30]DING S H H,FUNG B C M,CHARLAND P.Asm2vec:boosting static representation robustness for binary clone search against code obfuscation and compiler optimization[C]//The 2019 IEEE Symposium on Security and Privacy.IEEE,2019:472-489. [31]DEVLIN J,CHANG M W,LEE K,et al.Bert:Pre-training ofdeep bidirectional transformers for language understanding[C]//North American Chapter of the Association for Computational Linguistics.NAACL-HLT,2019:4171-4186. [32]LIU Y H,OTT M,GOYAL N,et al.RoBERTa:A Robustly Optimized BERT Pretraining Approach[C]//International Conference on Learning Representations.ICLR,2020. [33]REIMERS N,GUREVYCH I.Sentence-BERT:Sentence Em-beddings using Siamese BERT-Networks[C]//The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.EMNLP,2019:3982-3992. [34]LI X,QU Y,YIN H.Palmtree:learning an assembly language model for instruction embedding[C]//The 2021 ACM SIGSAC Conference on Computer and Communications Security.ACM,2021:3236-3251. [35]LIU B,HUO W,ZHANG C,et al.αdiff:cross-version binary code similarity detection with dnn[C]//The 33rd ACM/IEEE International Conference on Automated Software Engineering.IEEE,2018:667-678. [36]WANG H,QU W,KATZ G,et al.jTrans:jump-aware trans-former for binary code similarity detection[C]//The 31st ACM SIGSOFT International Symposium on Software Testing and Analysis.ACM,2022:1-13. [37]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//The 31st International Conference on Neural Information Processing Systems.NIPS,2017:5998-6008. [38]YAN Y T,YU L,WANG T Y,et al.Research on Binary Code Similarity Detection Based on Jump-SBERT[J].Computer Science,2024,51(5):355-362. [39]PALMER D D,OSTENDORF M.Improving out-of-vocabulary name resolution[J].Computer Speech & Language,2005,19(1):107-128. [40]WANG T Y,PAN Z L,YU L,et al.Binary Code Similarity Detection Method Based on Pre-training Assembly Instruction Representation[J].Computer Science,2023,50(4):288-297. [41]LI T,WANG J S.Binary code similarity detection via attention mechanism and Child-Sum Tree-LSTM[J].Cyber Security and Data Governance,2023,42(11):8-14,34. [42]AHMED M,SAMEE M,MERCER R.Improving Tree-LSTMwith Tree Attention[C]//2019 IEEE 13th International Confe-rence on Semantic Computing.ICSC,2019:247-254. [43]HUANG C S,ZHU G B,GE G J,et al.FastBCSD:Fast and Efficient Neural Network for Binary Code Similarity Detection[J].arXiv:2306.14168,2023. [44]KIM Y.Convolutional Neural Networks for Sentence Classifica-tion[C]//the 2014 Conference on Empirical Methods in Natural Language Processing.EMNLP,2014:1746-1751. [45]TOLSTIKHIN I,HOULSBY N,KOLESNIKOV A,et al.MLP-mixer:an all-MLP architecture for vision[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems.Red Hook,NY:Curran Associates Inc.,2021:24261-24272. [46]WANG H,GAO Z Y,ZHANG C,et al.CEBin:A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection[C]//The ACM SIGSOFT International Symposium on Software Testing and Analysis.ISSTA,2024. [47]TAKU K.Subword regularization:Improving neural networktranslation models with multiple subword candidates[C]//The 56th Annual Meeting of the Association for Computational Linguistics(Long Papers).Association for Computational Linguistics,2018:66-75. [48]Zynamics.com.BinDiff[EB/OL].(2024-03-09) [2024-03-09].https://www.zynamics.com/bindiff.html. [49]Joxeankoret.Diaphora:A Free and Open Source Program Diffing Tool[EB/OL].(2024-03-12) [2024-03-12].http://diaphora.re/. [50]DING S H H,FUNG B C M,CHARLAND P.Kam1n0:Mapreduce-based assembly clone search for reverse engineering[C]//The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.KDD,2016:461-470. [51]QIAN F,ZHOU R,XU C,et al.Scalable Graph-based BugSearch for Firmware Images[C]//ACM Sigsac Conference on Computer & Communications Security.CCS,2016:480-491. [52]NG A Y,JORDAN M I,WEISS Y.On spectral clustering:ana-lysis and an algorithm[C]//Proceedings of the 15th International Conference on Neural Information Processing Systems:Natural and Synthetic.Cambridge,MA:MIT,2001:849-856. [53]CHATFIELD K,LEMPITSKY V S,VEDALDI A,et al.The devil is in the details:an evaluation of recent feature encoding methods[C]//British Machine Vision Conference 2011.NIPS,2011. [54]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross platform binary code similarity detection[C]//The 2017 ACM SIGSAC Conference on Computer and Communications Security.CCS,2017:363-376. [55]DAI H J,DAI B,SONG L.Discriminative Embeddings of Latent Variable Models for Structured Data[C]//The 33rd International Conference on International Conference on Machine Learning.ICML,2016:2702-2711. [56]GAO J,YANG X,FU Y,et al.VulSeeker:a semantic learning based vulnerability seeker for cross-platform binary[C]//The 33rd ACM/IEEE International Conference on Automated Software Engineering.ACM,2018:896-899. [57]JIANG S,FU C,QIAN Y K,et al.IFAttn:Binary code similarity analysis based on interpretable features with attention[J].Computers & Security,2022,120:102804. [58]KIM D,KIM E,CHA S K,et al.Revisiting Binary Code Simila-rity Analysis Using Interpretable Feature Engineering and Lessons Learned[C]//IEEE Transactions on Software Enginee-ring.IEEE,2022:1661-1682. [59]JIA A,FAN M,XU X,et al.Cross-Inlining Binary FunctionSimilarity Detection[C]//The IEEE/ACM 46th International Conference on Software Engineering.ICSE,2024:1-13. [60]KINABLE J,KOSTAKIS O.Malware Classification based onCall Graph Clustering[J].Journal in Computer Virology,2010,7:233-245. [61]MASSARELLI L,DI LUNA G A,PETRONI F,et al.Investigating graph embedding neural networks with unsupervised features extraction for binary analysis[C]//the 2nd Workshop on Binary Analysis Research.BAR,2019. [62]YU Z P,CAO R,TANG Q Y,et al.Order matters:Semantic-aware neural networks for binary code similarity detection[C]//The AAAI Conference on Artificial Intelligence.AAAI,2020:1145-1152. [63]CHANDRAMOHAN M,XUE Y X,XU Z Z,et al.Bingo:Crossarchitecture cross-os binary search[C]//The 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering.ACM,2016:678-689. [64]LI Y J,GU C J,DULLIEN T,et al.Graph Matching Networks for Learning the Similarity of Graph Structured Objects[C]//The 36th International Conference on Machine Learning.ICML,2019:3835-3845. |
[1] | ZHU Xiaoyan, WANG Wenge, WANG Jiayin, ZHANG Xuanping. Just-In-Time Software Defect Prediction Approach Based on Fine-grained Code Representationand Feature Fusion [J]. Computer Science, 2025, 52(1): 242-249. |
[2] | LIU Chunling, QI Xuyan, TANG Yonghe, SUN Xuekai, LI Qinghao, ZHANG Yu. Summary of Token-based Source Code Clone Detection Techniques [J]. Computer Science, 2024, 51(6): 12-22. |
[3] | SHEN Nan, CHEN Gang. Formalization of Inverse Matrix Operation Based on Coq [J]. Computer Science, 2023, 50(6A): 220400108-7. |
[4] | GAO Yuzhao, XING Yunhan, LIU Jiaxiang. Constraint-based Verification Method for Neural Networks [J]. Computer Science, 2023, 50(11A): 221000045-5. |
[5] | FANG Lei, WU Ze-hui, WEI Qiang. Summary of Binary Code Similarity Detection Techniques [J]. Computer Science, 2021, 48(5): 1-8. |
[6] | MI Qing, GUO Li-min, CHEN Jun-cheng. Code Readability Assessment Method Based on Multidimensional Features and Hybrid Neural Networks [J]. Computer Science, 2021, 48(12): 94-99. |
[7] | ZHANG Xiong and LI Zhou-jun. Survey of Fuzz Testing Technology [J]. Computer Science, 2016, 43(5): 1-8. |
[8] | WANG Guo-dong,CHEN Ping,MAO Bing,XIE Li. Automatic Generation of Attach-based Signature [J]. Computer Science, 2012, 39(3): 118-123. |
[9] | JIN Ying,LIU Xin,ZHANG Jing. Research on Eliciting Security Requirement Methods [J]. Computer Science, 2011, 38(5): 14-19. |
[10] | XIAO Hai,CHEN Ping,MAO Bing,XIE Li. New Binary System for Detecting and Locating Integer-based Vulnerability on Run-time Type Analysis [J]. Computer Science, 2011, 38(1): 140-144. |
[11] | . [J]. Computer Science, 2009, 36(4): 169-171. |
[12] | . [J]. Computer Science, 2009, 36(1): 252-255. |
|