计算机科学 ›› 2023, Vol. 50 ›› Issue (6A): 220100255-7.doi: 10.11896/jsjkx.220100255

• 信息安全 • 上一篇    下一篇

基于IR2Vec模型的跨架构密码算法识别

赵晨霞1,2, 舒辉2, 沙子涵2   

  1. 1 郑州大学网络空间安全学院 郑州 450001;
    2 信息工程大学数学工程与先进计算国家重点实验室 郑州 450001
  • 出版日期:2023-06-10 发布日期:2023-06-12
  • 通讯作者: 舒辉(shuhui123@126.com)
  • 作者简介:(sidefacezcx@sina.com)
  • 基金资助:
    国家重点研发计划(2019QY1305)

Cross-architecture Cryptographic Algorithm Recognition Based on IR2Vec

ZHAO Chenxia1,2, SHU Hui2, SHA Zihan2   

  1. 1 School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450001,China;
    2 State Key Laboratory of Mathematical Engineering and Advanced Computing,Information Engineering University,Zhengzhou 450001,China
  • Online:2023-06-10 Published:2023-06-12
  • About author:ZHAO Chenxia,born in 1995,master candidate.Her main research interests include cyber security and reverse engineering. SHU Hui,born in 1974,Ph.D,professor,Ph.D supervisor.His main research interests include cyber security and reverse engineering.
  • Supported by:
    National Key R&D Program of China(2019QY1305).

摘要: 在信息安全领域,加密技术被用来保障信息的安全性,在可执行文件中识别密码算法对于保护信息安全有着重要意义。现有密码算法识别技术大多只能针对单一架构,在跨架构场景下识别能力较差,因此,提出了IR2Vec模型,着力解决跨架构下的密码算法识别问题。该模型首先基于LLVM衔接不同的前端和后端的特性来解决跨架构的问题,利用LLVM-RetDec将可执行文件反编译成中间表示语言,然后改进PV-DM模型将中间表示语言语义向量化,通过求取向量的余弦距离来判断语义相似性。收集多种密码算法来建立密码算法库,将待检测目标可执行文件分别与密码算法库中的文件进行一一对比,取相似度最高的为识别结果。实验结果表明,该技术能够有效识别出可执行文件中的密码算法,该模型可同时支持X86,ARM和MIPS 3种架构,Clang和GCC两种编译器,以及O0,O1,O2和O3这4种优化选项的二进制文件交叉识别。

关键词: 相似性识别, 跨架构, 密码算法, LLVM

Abstract: In the field of information security,encryption technology is used to ensure the security of information.Identifying cryptographic algorithm in executable file is of great significance to protect information security.Most of the existing cryptographic algorithm recognition technologies can only target a single architecture and have poor recognition ability in cross-architecture scenarios.Therefore,this paper proposes IR2Vec model to solve the problem of cryptographic algorithm recognition in cross-architecture.Firstly,the model solves the cross-architecture problem based on the characteristics of LLVM connecting different front-end and back-end.The executable file is decompiled into the intermediate representation language by LLVM-RetDec,and then the PV-DM model is improved to quantify the semantics of the intermediate representation language,and the semantic similarity is judged by calculating the cosine distance of the vector.Collecting a variety of cryptographic algorithms to establish the cryptographic algorithm library,comparing the executable files of the target to be detected with the files in the cryptographic algorithm library one by one,and taking the one with the highest similarity as the recognition result.Experimental results show that the technology can effectively identify the cryptographic algorithm in the executable file.The model can support the cross recognition of binary files of X86,ARM and MIPS,Clang and GCC compilers and O0,O1,O2 and O3 optimization options.

Key words: Similarity recognition, Cross-architecture, Cryptography algorithm, LLVM

中图分类号: 

  • TP393
[1]ESCHWEILER S,YAKDAN K,GERHARDS-PADILLA E.discovRE:Efficient Cross-Architecture Identification of Bugs in Binary Code[C]//The Network and Distributed System Security Symposium(NDSS 2016).2016.
[2]LU T L,WU J,BAO Y,et al.Computer virus analysis and simulation based on cryptographic algorithm detection[J].Computer Simulation,2020,37(11):173-178.
[3]MATENAAR F,WICHMANN A,LEDER F,et al.CIS:TheCrypto Intelligence System for automatic detection and localization of cryptographic functions in current malware[C]//International Conference on Malicious & Unwanted Software.IEEE,2012.
[4]LI X,WANG X,CHANG W.CipherXRay:Exposing Cryptographic Operations and Transient Secrets from Monitored Binary Execution[J].IEEE Transactions on Dependable & Secure Computing,2014,11(2):101-114.
[5]HILL G D,BELLEKENS X.Deep Learning Based Cryptogra-phic Primitive Classification[J].arXiv:1709.08385,2017.
[6]HILL,GREGORY,BELLEKENS,et al.CryptoKnight:Generating and Modelling Compiled Cryptographic Primitives[J].Information,2018,9(9):231.
[7]CALVET J,FERNANDEZ J M,MARION J Y.Aligot Cryptographic Function Identification in Obfuscated Binary Programs[C]//Proceeding of the 2012 ACM Conference on Computer and Communications Security.2012:169-182.
[8]LI J Z,JIANG L H,SHU H,et al.Cryptographic functionscreening based on dynamic cyclic information entropy[J].Computer Applications,2014,34(4):1025-1028,1033.
[9]LI J Z,JIANG L H,SHU H.Binary code level cryptographic algorithm cyclic feature recognition[J].Computer Engineering and Design,2014,35(8):2628-2632.
[10]XU D,JIANG M,WU D.Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping[C]//2017 IEEE Symposium on Security and Privacy(SP).IEEE,2017.
[11]BENEDETTIA L,AURÉLIEN T,CYBERSECURITY J F A.Detection of cryptographic algorithms with grap[J/OL].https://github.com/AirbusCyber/grap.
[12]LESTRINGANT P,GUIHÉRY F,FOUQUE P A.Automated Identification of Cryptographic Primitives in Binary Code with Data Flow Graph Isomorphism[C]//Proceedings of the 10th ACM Symposium on Information,Computer and Communications Security.2015:203-214.
[13]MEIJER C,MOONSAMY V,WETZELS J.Where’s Crypto?:Automated Identification and Classification of Proprietary Cryptographic Primitives in Binary Code[C]//USNIX Security Symposium.2021:555-572.
[14]QIAN F,ZHOU R,XU C,et al.Scalable Graph-based BugSearch for Firmware Images[C]//ACM Sigsac Conference on Computer & Communications Security.2016:480-491.
[15]NG A Y,JORDAN M I,WEISS Y,et al.On Spectral Cluste-ring:Analysis and an algorithm[C]//Advances in Neural Information Processing Systems.2002:849-856.
[16]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Confe-rence on Compu-ter and Communications Security.2017:363-376.
[17]DAI H,DAI B,SONG L.Discriminative embeddings of latentvariable models for structured data[C]//International Confe-rence on Machine Learning.2016:2702-2711.
[18]LAGEMAN N,KILMER E D,WALLS R J,et al.BinDNN:Resilient Function Matching Using Deep Learning[C]//International Conference on Security and Privacy in Communication Systems.Cham:Springer,2016.
[19]HU Y,ZHANG Y,LI J,et al.Binary Code Clone Detectionacross Architectures and Compiling Configurations[C]//IEEE/ACM International Conference on Program Comprehension.IEEE Computer Society,2017.
[20]DING S H H,FUNG B C M,CHARLAND P.Asm2vec:Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization[C]//2019 IEEE Symposium on Security and Privacy(SP).IEEE,2019:472-489.
[21]LUO Z,WANG B,TANG Y,et al.Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things[J].Applied Sciences,2019,9(16):3283.
[22]MASSARELLI L,LUNA G,PETRONI F,et al.SAFE:Self-Attentive Function Embeddings for Binary Similarity[C]//International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment (DIMVA 2019).2019:309-329.
[23]REN X,HO M,MING J,et al.Unleashing the Hidden Power of Compiler Optimization on Binary Code Difference:An Empirical Study[C]//Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation.2021:142-157.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!