计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230800078-7.doi: 10.11896/jsjkx.230800078

• 计算机软件&体系架构 • 上一篇    下一篇

基于深度强化学习的二进制代码模糊测试方法

王栓奇1, 赵健鑫2, 刘驰2, 武伟1, 刘钊1   

  1. 1 中国兵器工业信息中心 北京 100089
    2 北京理工大学计算机学院 北京 100081
  • 发布日期:2024-06-06
  • 通讯作者: 王栓奇(93660036@qq.com)
  • 基金资助:
    某大型工业软件研究开发项目(ZQ2020D204007)

Fuzz Testing Method of Binary Code Based on Deep Reinforcement Learning

WANG Shuanqi1, ZHAO Jianxin2, LIU Chi2, WU Wei1, LIU Zhao1   

  1. 1 Information Center of China North Industries Group Corporation,Beijing 100089,China
    2 School of Computer Science,Beijing Institute of Technology,Beijing 100081,China
  • Published:2024-06-06
  • About author:WANG Shuanqi,born in 1984,Ph.D,senior engineer.His main research interests include software test verification and vulnerability mining.
  • Supported by:
    Large-scale Industrial Software Research and Development Project(ZQ2020D204007)

摘要: 漏洞挖掘是计算机软件安全领域的主要研究方向,其中模糊测试是重要的动态挖掘方法。为解决二进制代码漏洞挖掘中汇编代码体积庞大导致检测既困难又耗时、模糊测试效率低下等问题,提出基于深度强化学习的二进制代码模糊测试方法。首先将模糊测试过程建模为面向强化学习的多步马尔可夫决策过程,通过构建深度强化学习模型辅助模糊测试变异策略选择,实现对变异策略的动态优化。然后设计和搭建基于深度强化学习的二进制代码模糊测试平台,利用AFL实现模糊测试环境,并使用Keras-RL2库和OpenAI Gym框架实现深度强化学习算法和强化学习环境。最后通过实验分析来验证所提方法和测试平台的有效性和适用性,实验结果显示深度强化学习模型能够辅助模糊测试过程快速覆盖更多路径,能够暴露更多漏洞缺陷,显著提高二进制代码漏洞挖掘和定位的效率。

关键词: 二进制代码, 漏洞挖掘, 模糊测试, 深度强化学习, 测试平台

Abstract: Vulnerability mining is the main research direction in the field of computer software security,in which fuzz testing is an important dynamic mining method.In order to solve the problems such as time-consuming and low efficiency of fuzz testing caused by the large volume of assembly code,a novel binary code vulnerability mining technology based on deep reinforcement learning is proposed.The fuzz testing process is modeled as a multi-step Markov decision-making process oriented to reinforcement learning.The selection of fuzz testing mutation strategy is optimized by building a deep reinforcement learning model to achieve dynamic optimization.Then design and build a binary code fuzz testing platform based on deep reinforcement learning,use AFL to implement fuzz testing environment,and use Keras RL2 library and OpenAI Gym framework to implement deep reinforcement learning algorithm and reinforcement learning environment.Finally,the effectiveness and applicability of the proposed method and testing platform are verified through experimental analysis.Experimental results show that the deep reinforcement learning model can assist the fuzz testing process to quickly cover more paths,expose more vulnerabilities and defects,and significantly improve the efficiency of binary code vulnerability mining and location.

Key words: Binary code, Vulnerability mining, Fuzz testing, Deep reinforcement learning, Testing platform

中图分类号: 

  • TP311
[1]WU S Z,GUO T,DONG G W,et al.Software vulnerabilityanalysis technology[M].Beijing:Science Press,2014.
[2]ZHU X D.Research on Key Issues of Binary Code SimilarityAnalysis[D].Zhengzhou:PLA Strategic Support Force Information Engineering University,2021.
[3]FLAKE H.Structural comparison of executable objects[C]//IEEE Conference on Detection of Intrusions and Malware & Vulnerability Assessment(DIMVA).2004.
[4]GAO D,REITER M K,SONG D.BinHunt:Automatically Fin-ding Semantic Differences in Binary Programs[C]//Interna-tional Conference on Information & Communications Security.2008.
[5]ZHANG X,LI Z J.Survey of Fuzz Testing Technology[J].Computer Science,2016,43(5):1-8.
[6]CADAR C,GANESH V,PAWLOWSKI M P,et al.EXE:Automatically generating inputs of death[J].ACM Transactions on Information and System Security(TISSEC).2008,12(2):1-38.
[7]CADAR C,DUNBAR D,ENGLER R D.KLEE:Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs[C]//Usenix Conference on Operating Systems Design & Implementation.2009.
[8]SHOSHITAISHVILI Y,KRUEGEL C,VIGNA G.SOK:(State of) The Art of War:Offensive Techniques in Binary Analysis[C]//2016 IEEE Symposium on Security and Privacy(SP).2016.
[9]NEWSOME J,SONG D.Dynamic taint analysis for automaticdetection,analysis,and signature generation of exploits on commodity software[J].Chinese Journal of Engineering Mathema-tics,2012,29(5):720-724.
[10]GODEFROID P,LEVIN M Y,MOLNAR D.SAGE:whitebox fuzzing for security testing[J].Queue,2012,10(3):40-44.
[11]WANG Y,JIA P,LIU L,et al.A systematic review of fuzzing based on machine learning techniques[J].arXiv:1908.01262,2019.
[12]ZHANG Z.Research on Fuzz Testing Technology Based onDDPG Reinforcement Learning Algorithm[D].Beijing:Beijing University of Posts and Telecommunications,2021.
[13]LV C,JI S,LI Y,et al.SmartSeed:Smart Seed Generation for Efficient Fuzzing[J].arXiv:1807.02606,2018.
[14]WU Z,JOHNSON E,YANG W,et al.REINAM:reinforcement learning for input-grammar inference[C]//The 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2019.
[15]SHE D,KRISHNA R,YAN L,et al.MTFuzz:Fuzzing with aMulti-task Neural Network[C]//ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering(ESEC/FSE).2020.
[16]SON S,LEE S,HAN H,et al.Montage:A Neural NetworkLanguage Model-Guided JavaScript Engine Fuzzer[C]//20th USENIX Security Symposium(USENIX Security 2020).2020
[17]ZONG P,LV T,WANG D,et al.FuzzGuard:Filtering out Unreachable Inputs in Directed Grey-box Fuzzing through Deep Learning[C]//29th USENIX Security Symposium.2020.
[18]BOTTINGER K,GODEFROID P,SINGH R.Deep Reinforce-ment Fuzzing[C]//2018 IEEE Security and Privacy Workshops.2018.
[19]DROZD W,WAGNER M D.FuzzerGym:A Competitive Framework for Fuzzing and Learning[J].arXiv:1807.07490,2018.
[20]DOLAN-GAVITT B,HULIN P,KIRDA E,et al.Lava:Large-scale automated vulnerability addition[C]//2016 IEEE Symposium on Security and Privacy.2016.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!