Computer Science ›› 2023, Vol. 50 ›› Issue (6A): 220100255-7.doi: 10.11896/jsjkx.220100255

• Information Security • Previous Articles     Next Articles

Cross-architecture Cryptographic Algorithm Recognition Based on IR2Vec

ZHAO Chenxia1,2, SHU Hui2, SHA Zihan2   

  1. 1 School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450001,China;
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing,Information Engineering University,Zhengzhou 450001,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:ZHAO Chenxia,born in 1995,master candidate.Her main research interests include cyber security and reverse engineering. SHU Hui,born in 1974,Ph.D,professor,Ph.D supervisor.His main research interests include cyber security and reverse engineering.
  • Supported by:
    National Key R&D Program of China(2019QY1305).

Abstract: In the field of information security,encryption technology is used to ensure the security of information.Identifying cryptographic algorithm in executable file is of great significance to protect information security.Most of the existing cryptographic algorithm recognition technologies can only target a single architecture and have poor recognition ability in cross-architecture scenarios.Therefore,this paper proposes IR2Vec model to solve the problem of cryptographic algorithm recognition in cross-architecture.Firstly,the model solves the cross-architecture problem based on the characteristics of LLVM connecting different front-end and back-end.The executable file is decompiled into the intermediate representation language by LLVM-RetDec,and then the PV-DM model is improved to quantify the semantics of the intermediate representation language,and the semantic similarity is judged by calculating the cosine distance of the vector.Collecting a variety of cryptographic algorithms to establish the cryptographic algorithm library,comparing the executable files of the target to be detected with the files in the cryptographic algorithm library one by one,and taking the one with the highest similarity as the recognition result.Experimental results show that the technology can effectively identify the cryptographic algorithm in the executable file.The model can support the cross recognition of binary files of X86,ARM and MIPS,Clang and GCC compilers and O0,O1,O2 and O3 optimization options.

Key words: Similarity recognition, Cross-architecture, Cryptography algorithm, LLVM

CLC Number: 

  • TP393
[1]ESCHWEILER S,YAKDAN K,GERHARDS-PADILLA E.discovRE:Efficient Cross-Architecture Identification of Bugs in Binary Code[C]//The Network and Distributed System Security Symposium(NDSS 2016).2016.
[2]LU T L,WU J,BAO Y,et al.Computer virus analysis and simulation based on cryptographic algorithm detection[J].Computer Simulation,2020,37(11):173-178.
[3]MATENAAR F,WICHMANN A,LEDER F,et al.CIS:TheCrypto Intelligence System for automatic detection and localization of cryptographic functions in current malware[C]//International Conference on Malicious & Unwanted Software.IEEE,2012.
[4]LI X,WANG X,CHANG W.CipherXRay:Exposing Cryptographic Operations and Transient Secrets from Monitored Binary Execution[J].IEEE Transactions on Dependable & Secure Computing,2014,11(2):101-114.
[5]HILL G D,BELLEKENS X.Deep Learning Based Cryptogra-phic Primitive Classification[J].arXiv:1709.08385,2017.
[6]HILL,GREGORY,BELLEKENS,et al.CryptoKnight:Generating and Modelling Compiled Cryptographic Primitives[J].Information,2018,9(9):231.
[7]CALVET J,FERNANDEZ J M,MARION J Y.Aligot Cryptographic Function Identification in Obfuscated Binary Programs[C]//Proceeding of the 2012 ACM Conference on Computer and Communications Security.2012:169-182.
[8]LI J Z,JIANG L H,SHU H,et al.Cryptographic functionscreening based on dynamic cyclic information entropy[J].Computer Applications,2014,34(4):1025-1028,1033.
[9]LI J Z,JIANG L H,SHU H.Binary code level cryptographic algorithm cyclic feature recognition[J].Computer Engineering and Design,2014,35(8):2628-2632.
[10]XU D,JIANG M,WU D.Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping[C]//2017 IEEE Symposium on Security and Privacy(SP).IEEE,2017.
[11]BENEDETTIA L,AURÉLIEN T,CYBERSECURITY J F A.Detection of cryptographic algorithms with grap[J/OL].https://github.com/AirbusCyber/grap.
[12]LESTRINGANT P,GUIHÉRY F,FOUQUE P A.Automated Identification of Cryptographic Primitives in Binary Code with Data Flow Graph Isomorphism[C]//Proceedings of the 10th ACM Symposium on Information,Computer and Communications Security.2015:203-214.
[13]MEIJER C,MOONSAMY V,WETZELS J.Where’s Crypto?:Automated Identification and Classification of Proprietary Cryptographic Primitives in Binary Code[C]//USNIX Security Symposium.2021:555-572.
[14]QIAN F,ZHOU R,XU C,et al.Scalable Graph-based BugSearch for Firmware Images[C]//ACM Sigsac Conference on Computer & Communications Security.2016:480-491.
[15]NG A Y,JORDAN M I,WEISS Y,et al.On Spectral Cluste-ring:Analysis and an algorithm[C]//Advances in Neural Information Processing Systems.2002:849-856.
[16]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Confe-rence on Compu-ter and Communications Security.2017:363-376.
[17]DAI H,DAI B,SONG L.Discriminative embeddings of latentvariable models for structured data[C]//International Confe-rence on Machine Learning.2016:2702-2711.
[18]LAGEMAN N,KILMER E D,WALLS R J,et al.BinDNN:Resilient Function Matching Using Deep Learning[C]//International Conference on Security and Privacy in Communication Systems.Cham:Springer,2016.
[19]HU Y,ZHANG Y,LI J,et al.Binary Code Clone Detectionacross Architectures and Compiling Configurations[C]//IEEE/ACM International Conference on Program Comprehension.IEEE Computer Society,2017.
[20]DING S H H,FUNG B C M,CHARLAND P.Asm2vec:Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization[C]//2019 IEEE Symposium on Security and Privacy(SP).IEEE,2019:472-489.
[21]LUO Z,WANG B,TANG Y,et al.Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things[J].Applied Sciences,2019,9(16):3283.
[22]MASSARELLI L,LUNA G,PETRONI F,et al.SAFE:Self-Attentive Function Embeddings for Binary Similarity[C]//International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment (DIMVA 2019).2019:309-329.
[23]REN X,HO M,MING J,et al.Unleashing the Hidden Power of Compiler Optimization on Binary Code Difference:An Empirical Study[C]//Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation.2021:142-157.
[1] CHEN Tao, SHU Hui, XIONG Xiao-bing. Study of Universal Shellcode Generation Technology [J]. Computer Science, 2021, 48(4): 288-294.
[2] HU Wei-fang, CHEN Yun, LI Ying-ying, SHANG Jian-dong. Loop Fusion Strategy Based on Data Reuse Analysis in Polyhedral Compilation [J]. Computer Science, 2021, 48(12): 49-58.
[3] HU Hao, SHEN Li, ZHOU Qing-lei and GONG Ling-qin. Node Fusion Optimization Method Based on LLVM Compiler [J]. Computer Science, 2020, 47(6A): 561-566.
[4] ZHANG Qi-liang, ZHANG Yu and ZHOU Kun. CCodeExtractor:Automatic Approach of Function Extraction for C Programs [J]. Computer Science, 2017, 44(4): 16-20.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!