计算机科学 ›› 2024, Vol. 51 ›› Issue (10): 208-217.doi: 10.11896/jsjkx.230700008

• 计算机软件 • 上一篇    下一篇

高健壮性二进制应用程序裁剪

丁铎1,2, 孙聪1, 郑涛3   

  1. 1 西安电子科技大学网络与信息安全学院 西安 710071
    2 中国电子科技集团公司第五十四研究所 石家庄 050050
    3 中国航空工业集团公司西安航空计算技术研究所 西安 710068
  • 收稿日期:2023-07-03 修回日期:2023-11-14 出版日期:2024-10-15 发布日期:2024-10-11
  • 通讯作者: 孙聪(suncong@xidian.edu.cn)
  • 作者简介:(dingduo54@qq.com)
  • 基金资助:
    国家自然科学基金(62272366);陕西省重点研发计划(2023-YBGY-371)

Robust Binary Program Debloating

DING Duo1,2, SUN Cong1, ZHENG Tao3   

  1. 1 School of Cyber Engineering,Xidian University,Xi'an 710071,China
    2 The 54th Research Institute of China Electronics Technology Group Corporation Electronic Equipment,Shijiazhuang 050050,China
    3 AVIC XI'AN Aeronautics Computing Technique Research Institute,Xi'an 710068,China
  • Received:2023-07-03 Revised:2023-11-14 Online:2024-10-15 Published:2024-10-11
  • About author:DING Duo,born in 1995,master,assistant engineer.His main research intere-sts include software security and so on.
    SUN Cong,born in 1982,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.28286M).His main research interests include software security,program analysis,and high-confidence software.
  • Supported by:
    National Natural Science Foundation of China(62272366) and Key Research and Development Program of Shaanxi Province(2023-YBGY-371).

摘要: 应用程序的常用功能仅占其所有功能的小部分。冗余功能代码造成应用程序攻击面扩大,从而增大代码重用攻击风险。二进制程序裁剪能够在分析应用程序二进制的基础上,识别并删除程序冗余代码,减小程序攻击面。现有二进制裁剪方法依赖人工构造的输入产生初始控制流,并依赖启发式方法扩展控制流图,导致方法健壮性和可扩展性受限。文中提出并实现了一种高健壮性二进制应用程序裁剪方法(RBdeb),使用黑盒模糊测试技术获取具有更高健壮性的合法执行轨迹集合,基于图同构算法自动分类相似库函数,提出的路径发现算法从初始执行轨迹构成的二进制控制流子图出发,扩展二进制控制流路径和同类库函数调用,生成高健壮性的裁剪结果二进制文件。实验结果表明,相比现有方案,RBdeb具有更高的路径覆盖率和裁剪后二进制健壮性,路径发现算法和库分类方法具有更强的可扩展性,所提方法能够裁剪大规模实际应用程序。

关键词: 程序裁剪, 二进制分析, 模糊测试, 二进制重写, 程序分析

Abstract: The frequently used functionalities usually constitute a small portion of applications' functionalities.The redundant code for rarely used functionalities raises the attack surface of the applications,thus causing the potential risk of code reuse attacks.Binary program debloating can identify and remove the redundant code based on the binary analysis of the application,so as to reduce the attack surface.The state-of-the-art binary program debloating approach relies on artificially crafted inputs to derive the initial control flows.It uses heuristics to extend the binary control-flow graph for debloating.Such an approach has limited robustness and scalability.This paper proposes and implements a robust binary program debloating approach(RBdeb).It uses black-box fuzzing to derive highly-robust valid execution traces of the binary,and categorizes similar library functions automatically based on the graph isomorphism algorithm.The proposed path discovery algorithm extends the binary control flows with the classified library function calls from the control-flow sub-graph of the initial execution traces and generates the robust binary file as the debloating result.Experimental results demonstrate that RBdeb has higher path coverage and debloated binary robustness than the state-of-the-art approaches.The path discovery algorithm and library function categorization are more scalable.RBdeb can effectively debloat large real-world applications.

Key words: Program debloating, Binary analysis, Fuzzing, Binary rewriting, Program analysis

中图分类号: 

  • TP314
[1]QUACH A,ERINFOLAMI R,DEMICCO D,et al.A Multi-OS Cross-Layer Study of Bloating in User Programs,Kernel and Managed Execution Environments[C]//Proceedings of the 2017 Workshop on Forming an Ecosystem Around Software Transformation.New York:ACM,2017:65-70.
[2]WILLIAMS C.Anatomy of OpenSSL's Heartbleed:Just FourBytes Trigger Horror Bug [EB/OL].(2014-04-09)[2023-05-13].https://www.theregister.com/2014/04/09/heartbleed_explained.
[3]PENG G,LIANG Y,ZHANG H,et al.Survey on software binary code reuse technologies [J].Ruan Jian Xue Bao/Journal of Software,2017,28(8):2026-2045.
[4]XIN Q,ZHANG Q,ORSO A.Studying and Understanding the Tradeoffs Between Generality and Reduction in Software De-bloating[C]//Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering.New York:ACM,2022:99:1-99:13.
[5]REGEHR J,CHEN Y,CUOQ P,et al.Test-Case Reduction for C Compiler Bugs[C]//Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation.New York:ACM,2012:335-346.
[6]HEO K,LEE W,PASHAKHANLOO P,et al.Effective Pro-gram Debloating via Reinforcement Learning[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2018:380-394.
[7]XIN Q,KIM M,ZHANG Q,et al.Subdomain-based Generality-Aware Debloating[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering.New York:ACM,2020:224-236.
[8]SHARIF H,ABUBAKAR M,GEHANI A,et al.TRIMMER:Application Specialization for Code Debloating[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.New York:ACM,2018:329-339.
[9]QUACH A,PRAKASH A,YAN L.Debloating Softwarethrough Piece-Wise Compilation and Loading[C]//Proceedings of the 27th USENIX Security Symposium.USENIX Association,2018:869-886.
[10]PORTER C,MURURU G,BARUA P,et al.BlankIt Library Debloating:Getting What You Want instead of Cutting What You doŃt[C]//Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation.New York:ACM,2020:164-180.
[11]BISWAS P,BUROW N,PAYER M.Code Specializationthrough Dynamic Feature Observation[C]//Proceedings of the 11th ACM Conference on Data and Application Security and Privacy.New York:ACM,2021:257-268.
[12]QIAN C,KOO H,OH C,et al.Slimium:Debloating the Chro-mium Browser with Feature Subsetting[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2020:461-476.
[13]WU J,WU R,ANTONIOLI D,et al.LIGHTBLUE:Automatic Profile-Aware Debloating of Bluetooth Stacks[C]//Proceedings of the 30th USENIX Security Symposium.USENIX Association,2021:339-356.
[14]ZHANG Y,PANG C,PORTOKALIDIS G,et al.DebloatingAddress Sanitizer[C]//Proceedings of the 31st USENIX Secu-rity Symposium.USENIX Association,2022:4345-4363.
[15]AGADAKOS I,JIN D,WILLIAMS-KING D,et al.Nibbler:Debloating Binary Shared Libraries[C]//Proceedings of the 35th Annual Computer Security Applications Conference.New York:ACM,2019:70-83.
[16]REDINI N,WANG R,MACHIRY A,et al.BinTrimmer:Towards Static Binary Debloating through Abstract Interpretation[C]//Proceedings of International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment.Berlin:Springer,2019:482-501.
[17]ZHANG H,REN M,LEI Y,et al.One Size does not Fit All:Security Hardening of MIPS Embedded Systems via Static Binary Debloating for Shared Libraries[C]//Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2022:255-270.
[18]QIAN C,HU H,ALHARTHI M,et al.RAZOR:A Framework for Post-Deployment Software Debloating[C]//Proceedings of the 28th USENIX Security Symposium.USENIX Association,2019:1733-1750.
[19]CHRISTENSEN J,ANGHEL M,TAGLANG R,et al.DECAF:Automatic,Adaptive De-bloating and Hardening of COTS Firmware[C]//Proceedings of the 29th USENIX Security Sympo-sium.USENIX Association,2020:1713-1730.
[20]HU Z,DOLAN-GAVITT B.IRQDebloat:Reducing Driver Attack Surface in Embedded Devices[C]//Proceedings of 2022 IEEE Symposium on Security and Privacy.Piscataway:IEEE,2022:1608-1622.
[21]ZELLER A,HILDEBRANDT R.Simplifying and isolating fai-lure-inducing input [J].IEEE Transactions on Software Engineering,2002,28(2):183-200.
[22]TAN J,PANG J,SHAN Z,et al.Redundant instruction optimization algorithm in binary translation [J].Journal of Computer Research and Development,2017,54(9):1931-1944.
[23]JOSHUA P.boofuzz:Network Protocol Fuzzing for Humans.[EB/OL].(2020-10-21)[2023-05-13].https://boofuzz.readthedocs.io/en/stable/.
[24]DUTRA R,GOPINATH R,ZELLER A.Format Fuzzer:Effec-tive Fuzzing of Binary File Formats[J].arXiv:2109.11277,2021.
[25]The AFL Team.Technical Whitepaper for afl-fuzz.[EB/OL].https://lcamtuf.coredump.cx/afl/technical_details.txt.
[26]The Preeny Team.Preeny [EB/OL].(2021-07-04)[2023-05-13].https://github.com/zardus/preeny.
[27]The DynamoRIO Team.DynamoRIO.[EB/OL].(2023-06-29)[2023-06-30].https://dynamorio.org/.
[28]CORDELLA L,FOGGIA P,SANSONE C,et al.An ImprovedAlgorithm for Matching Large Graphs[C]//Proceedings of the 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition.2001:149-159.
[29]KIM S,SUN C,ZENG D,et al.Refining Indirect Call Targets at the Binary Level[C]//Proceedings of Network and Distributed Systems Security.Internet Society,2021.
[30]KIM S,ZENG D,SUN C,et al.BinPointer:Towards Precise,Sound,and Scalable Binary-Level Pointer Analysis[C]//Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction.New York:ACM,2022:169-180.
[31]ZHANG M,SEKAR R.Control Flow Integrity for COTS Binaries[C]//Proceedings of the 22nd USENIX Security Sympo-sium.USENIX Association,2013:337-352.
[32]Free Software Foundation,Inc.The GNU C Library- Function and Macro Index.[EB/OL].https://www.gnu.org/software/libc/manual/html_node/Function-Index.html.
[33]WILLIAMS-KING D,KOBAYASHI H,WILLIAMS-KING K,et al.Egalito:Layout-Agnostic Binary Recompilation[C]//Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2020:133-147.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!