Computer Science ›› 2024, Vol. 51 ›› Issue (10): 208-217.doi: 10.11896/jsjkx.230700008

• Computer Software • Previous Articles     Next Articles

Robust Binary Program Debloating

DING Duo1,2, SUN Cong1, ZHENG Tao3   

  1. 1 School of Cyber Engineering,Xidian University,Xi'an 710071,China
    2 The 54th Research Institute of China Electronics Technology Group Corporation Electronic Equipment,Shijiazhuang 050050,China
    3 AVIC XI'AN Aeronautics Computing Technique Research Institute,Xi'an 710068,China
  • Received:2023-07-03 Revised:2023-11-14 Online:2024-10-15 Published:2024-10-11
  • About author:DING Duo,born in 1995,master,assistant engineer.His main research intere-sts include software security and so on.
    SUN Cong,born in 1982,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.28286M).His main research interests include software security,program analysis,and high-confidence software.
  • Supported by:
    National Natural Science Foundation of China(62272366) and Key Research and Development Program of Shaanxi Province(2023-YBGY-371).

Abstract: The frequently used functionalities usually constitute a small portion of applications' functionalities.The redundant code for rarely used functionalities raises the attack surface of the applications,thus causing the potential risk of code reuse attacks.Binary program debloating can identify and remove the redundant code based on the binary analysis of the application,so as to reduce the attack surface.The state-of-the-art binary program debloating approach relies on artificially crafted inputs to derive the initial control flows.It uses heuristics to extend the binary control-flow graph for debloating.Such an approach has limited robustness and scalability.This paper proposes and implements a robust binary program debloating approach(RBdeb).It uses black-box fuzzing to derive highly-robust valid execution traces of the binary,and categorizes similar library functions automatically based on the graph isomorphism algorithm.The proposed path discovery algorithm extends the binary control flows with the classified library function calls from the control-flow sub-graph of the initial execution traces and generates the robust binary file as the debloating result.Experimental results demonstrate that RBdeb has higher path coverage and debloated binary robustness than the state-of-the-art approaches.The path discovery algorithm and library function categorization are more scalable.RBdeb can effectively debloat large real-world applications.

Key words: Program debloating, Binary analysis, Fuzzing, Binary rewriting, Program analysis

CLC Number: 

  • TP314
[1]QUACH A,ERINFOLAMI R,DEMICCO D,et al.A Multi-OS Cross-Layer Study of Bloating in User Programs,Kernel and Managed Execution Environments[C]//Proceedings of the 2017 Workshop on Forming an Ecosystem Around Software Transformation.New York:ACM,2017:65-70.
[2]WILLIAMS C.Anatomy of OpenSSL's Heartbleed:Just FourBytes Trigger Horror Bug [EB/OL].(2014-04-09)[2023-05-13].https://www.theregister.com/2014/04/09/heartbleed_explained.
[3]PENG G,LIANG Y,ZHANG H,et al.Survey on software binary code reuse technologies [J].Ruan Jian Xue Bao/Journal of Software,2017,28(8):2026-2045.
[4]XIN Q,ZHANG Q,ORSO A.Studying and Understanding the Tradeoffs Between Generality and Reduction in Software De-bloating[C]//Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering.New York:ACM,2022:99:1-99:13.
[5]REGEHR J,CHEN Y,CUOQ P,et al.Test-Case Reduction for C Compiler Bugs[C]//Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation.New York:ACM,2012:335-346.
[6]HEO K,LEE W,PASHAKHANLOO P,et al.Effective Pro-gram Debloating via Reinforcement Learning[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2018:380-394.
[7]XIN Q,KIM M,ZHANG Q,et al.Subdomain-based Generality-Aware Debloating[C]//Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering.New York:ACM,2020:224-236.
[8]SHARIF H,ABUBAKAR M,GEHANI A,et al.TRIMMER:Application Specialization for Code Debloating[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.New York:ACM,2018:329-339.
[9]QUACH A,PRAKASH A,YAN L.Debloating Softwarethrough Piece-Wise Compilation and Loading[C]//Proceedings of the 27th USENIX Security Symposium.USENIX Association,2018:869-886.
[10]PORTER C,MURURU G,BARUA P,et al.BlankIt Library Debloating:Getting What You Want instead of Cutting What You doŃt[C]//Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation.New York:ACM,2020:164-180.
[11]BISWAS P,BUROW N,PAYER M.Code Specializationthrough Dynamic Feature Observation[C]//Proceedings of the 11th ACM Conference on Data and Application Security and Privacy.New York:ACM,2021:257-268.
[12]QIAN C,KOO H,OH C,et al.Slimium:Debloating the Chro-mium Browser with Feature Subsetting[C]//Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security.New York:ACM,2020:461-476.
[13]WU J,WU R,ANTONIOLI D,et al.LIGHTBLUE:Automatic Profile-Aware Debloating of Bluetooth Stacks[C]//Proceedings of the 30th USENIX Security Symposium.USENIX Association,2021:339-356.
[14]ZHANG Y,PANG C,PORTOKALIDIS G,et al.DebloatingAddress Sanitizer[C]//Proceedings of the 31st USENIX Secu-rity Symposium.USENIX Association,2022:4345-4363.
[15]AGADAKOS I,JIN D,WILLIAMS-KING D,et al.Nibbler:Debloating Binary Shared Libraries[C]//Proceedings of the 35th Annual Computer Security Applications Conference.New York:ACM,2019:70-83.
[16]REDINI N,WANG R,MACHIRY A,et al.BinTrimmer:Towards Static Binary Debloating through Abstract Interpretation[C]//Proceedings of International Conference on Detection of Intrusions and Malware,and Vulnerability Assessment.Berlin:Springer,2019:482-501.
[17]ZHANG H,REN M,LEI Y,et al.One Size does not Fit All:Security Hardening of MIPS Embedded Systems via Static Binary Debloating for Shared Libraries[C]//Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2022:255-270.
[18]QIAN C,HU H,ALHARTHI M,et al.RAZOR:A Framework for Post-Deployment Software Debloating[C]//Proceedings of the 28th USENIX Security Symposium.USENIX Association,2019:1733-1750.
[19]CHRISTENSEN J,ANGHEL M,TAGLANG R,et al.DECAF:Automatic,Adaptive De-bloating and Hardening of COTS Firmware[C]//Proceedings of the 29th USENIX Security Sympo-sium.USENIX Association,2020:1713-1730.
[20]HU Z,DOLAN-GAVITT B.IRQDebloat:Reducing Driver Attack Surface in Embedded Devices[C]//Proceedings of 2022 IEEE Symposium on Security and Privacy.Piscataway:IEEE,2022:1608-1622.
[21]ZELLER A,HILDEBRANDT R.Simplifying and isolating fai-lure-inducing input [J].IEEE Transactions on Software Engineering,2002,28(2):183-200.
[22]TAN J,PANG J,SHAN Z,et al.Redundant instruction optimization algorithm in binary translation [J].Journal of Computer Research and Development,2017,54(9):1931-1944.
[23]JOSHUA P.boofuzz:Network Protocol Fuzzing for Humans.[EB/OL].(2020-10-21)[2023-05-13].https://boofuzz.readthedocs.io/en/stable/.
[24]DUTRA R,GOPINATH R,ZELLER A.Format Fuzzer:Effec-tive Fuzzing of Binary File Formats[J].arXiv:2109.11277,2021.
[25]The AFL Team.Technical Whitepaper for afl-fuzz.[EB/OL].https://lcamtuf.coredump.cx/afl/technical_details.txt.
[26]The Preeny Team.Preeny [EB/OL].(2021-07-04)[2023-05-13].https://github.com/zardus/preeny.
[27]The DynamoRIO Team.DynamoRIO.[EB/OL].(2023-06-29)[2023-06-30].https://dynamorio.org/.
[28]CORDELLA L,FOGGIA P,SANSONE C,et al.An ImprovedAlgorithm for Matching Large Graphs[C]//Proceedings of the 3rd IAPR-TC15 Workshop on Graph-based Representations in Pattern Recognition.2001:149-159.
[29]KIM S,SUN C,ZENG D,et al.Refining Indirect Call Targets at the Binary Level[C]//Proceedings of Network and Distributed Systems Security.Internet Society,2021.
[30]KIM S,ZENG D,SUN C,et al.BinPointer:Towards Precise,Sound,and Scalable Binary-Level Pointer Analysis[C]//Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction.New York:ACM,2022:169-180.
[31]ZHANG M,SEKAR R.Control Flow Integrity for COTS Binaries[C]//Proceedings of the 22nd USENIX Security Sympo-sium.USENIX Association,2013:337-352.
[32]Free Software Foundation,Inc.The GNU C Library- Function and Macro Index.[EB/OL].https://www.gnu.org/software/libc/manual/html_node/Function-Index.html.
[33]WILLIAMS-KING D,KOBAYASHI H,WILLIAMS-KING K,et al.Egalito:Layout-Agnostic Binary Recompilation[C]//Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems.New York:ACM,2020:133-147.
[1] CHEN Liang, SUN Cong. Deep-learning Based DKOM Attack Detection for Linux System [J]. Computer Science, 2024, 51(9): 383-392.
[2] MA Yingzi, CHEN Zhe, YIN Jiale, MAO Ruiqi. Memory Security Vulnerability Detection Combining Fuzzy Testing and Dynamic Analysis [J]. Computer Science, 2024, 51(2): 352-358.
[3] ZHUANG Yuan, CAO Wenfang, SUN Guokai, SUN Jianguo, SHEN Linshan, YOU Yang, WANG Xiaopeng, ZHANG Yunhai. Network Protocol Vulnerability Mining Method Based on the Combination of Generative AdversarialNetwork and Mutation Strategy [J]. Computer Science, 2023, 50(9): 44-51.
[4] ZHAO Mingmin, YANG Qiuhui, HONG Mei, CAI Chuang. Smart Contract Fuzzing Based on Deep Learning and Information Feedback [J]. Computer Science, 2023, 50(9): 117-122.
[5] JIN Tiancheng, DOU Liang, ZHANG Wei, XIAO Chunyun, LIU Feng, ZHOU Aimin. OJ Exercise Recommendation Model Based on Deep Reinforcement Learning and Program Analysis [J]. Computer Science, 2023, 50(8): 58-67.
[6] DU Hao, WANG Yunchao, YAN Chenyu, LI Xingwei. Test Cases Generation Techniques for Root Cause Location of Fault [J]. Computer Science, 2023, 50(7): 10-17.
[7] YANG Yahui, MA Rongkuan, GENG Yangyang, WEI Qiang, JIA Yan. Black-box Fuzzing Method Based on Reverse-engineering for Proprietary Industrial Control Protocol [J]. Computer Science, 2023, 50(4): 323-332.
[8] HE Jie, CAI Ruijie, YIN Xiaokang, LU Xuanting, LIU Shengli. Detection of Web Command Injection Vulnerability for Cisco IOS-XE [J]. Computer Science, 2023, 50(4): 343-350.
[9] XU Wei, WU Zehui, WANG Zimu, LU Li. Protocol Fuzzing Based on Testcases Automated Generation [J]. Computer Science, 2023, 50(12): 58-65.
[10] SHAO Wenqiang, CAI Ruijie, SONG Enzhou, GUO Xixi, LIU Shengli. Semantic-based Multi-architecture Binary Function Name Prediction Method [J]. Computer Science, 2023, 50(10): 369-376.
[11] HUANG Song, DU Jin-hu, WANG Xing-ya, SUN Jin-lei. Survey of Ethereum Smart Contract Fuzzing Technology Research [J]. Computer Science, 2022, 49(8): 294-305.
[12] JIANG Cheng-man, HUA Bao-jian, FAN Qi-liang, ZHU Hong-jun, XU Bo, PAN Zhi-zhong. Empirical Security Study of Native Code in Python Virtual Machines [J]. Computer Science, 2022, 49(6A): 474-479.
[13] HU Zhi-hao, PAN Zu-lie. Testcase Filtering Method Based on QRNN for Network Protocol Fuzzing [J]. Computer Science, 2022, 49(5): 318-324.
[14] WANG Tian-yuan, WU Shu-hong, LI Zhao-ji, XIN Hao-guang, LI Xuan, CHEN Yong-le. PGNFuzz:Pointer Generation Network Based Fuzzing Framework for Industry Control Protocols [J]. Computer Science, 2022, 49(10): 310-318.
[15] LI Yi-hao, HONG Zheng, LIN Pei-hong. Fuzzing Test Case Generation Method Based on Depth-first Search [J]. Computer Science, 2021, 48(12): 85-93.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!