计算机科学 ›› 2025, Vol. 52 ›› Issue (1): 393-400.doi: 10.11896/jsjkx.231100181

• 信息安全 • 上一篇    下一篇

抗语义分析的脚本融合技术

田博文1,2, 杨巨2, 熊小兵2, 段爽2, 魏然2   

  1. 1 郑州大学网络空间安全学院 郑州 450001
    2 信息工程大学网络空间安全学院 郑州 450001
  • 收稿日期:2023-11-27 修回日期:2024-05-03 出版日期:2025-01-15 发布日期:2025-01-09
  • 通讯作者: 熊小兵(bingxiaoxiong@163.com)
  • 作者简介:(tbw1999@gs.zzu.edu.cn)

Anti-semantic Analysis Script Fusion Technology

TIAN Bowen1,2, YANG Ju2, XIONG Xiaobing2, DUAN Shuang2, WEI Ran2   

  1. 1 School of Cyber Science and Engineering,Zhengzhou University,Zhengzhou 450001,China
    2 School of Cyber Science and Engineering,Information Engineering University,Zhengzhou 450001,China
  • Received:2023-11-27 Revised:2024-05-03 Online:2025-01-15 Published:2025-01-09
  • About author:TIAN Bowen,born in 1999,master.His main research interests include reverse engineering and software protection.
    XIONG Xiaobing,born in 1985,Ph.D,associate professor.His main research interests include reverse engineering and software protection.

摘要: 近年来,脚本程序被广泛应用于计算机领域。脚本程序因其功能强大,执行效率高,相比二进制程序编写更为简单,体积更小,所以在当前网络环境中的使用愈加频繁。目前脚本的混淆技术主要包括编码混淆、结构混淆和加密混淆3种主要类型。然而,现有的脚本混淆方式特征较为明显,存在被反混淆风险,一旦脚本被反混淆,其功能很容易被分析和理解。因此,提出了一种抗语义分析的脚本融合技术,通过将具有普通功能的掩体代码与需要保护的目标代码分块后进行深度融合,融合后的代码同时包含两个脚本的代码,不同脚本之间的语义和逻辑相互交错、相互依赖,使语义分析变得更加困难。对融合后代码的理解和分析需要更加强大的语义推理和上下文理解能力。针对PowerShell脚本的实验表明,融合后脚本程序的控制流循环复杂度平均提升了81.51%,极大提高了代码的混淆强度。该技术能够有效地模糊脚本语义,改变控制流特征,在面对ChatGPT的语义分析中表现出良好的效果,目标代码的核心功能难以被分析理解,从而提高了脚本程序的存活性和持久性。

关键词: 码保护, 混淆, 代码分块, 融合, 脚本程序

Abstract: In recent years,script programs have been widely used in the field of computer science.Script programs are increasingly being used in the current network environment due to their powerful functionality and high execution efficiency,simpler writing and smaller file size than binary programs.Currently,the main types of script obfuscation techniques include encoding obfuscation,structural obfuscation,and encryption obfuscation.However,existing script obfuscation methods have obvious features and are at risk of being deobfuscated.Once a script is deobfuscated,its functionality can be easily analyzed and understood.To address this issue,an anti-semantic analysis script fusion technique is proposed.By deeply merging camouflage code with the target code that needs to be protected after dividing them into blocks,the fused code contains the code from both scripts,and the semantics and logic of different scripts are intertwined and interdependent,making semantic analysis more difficult.Understanding and analyzing the fused code requires stronger semantic reasoning and contextual understanding capabilities.Experimental results on PowerShell scripts show that the control flow complexity of the fused script programs is increased by 81.51% on average,and the obfuscation strength of the code is greatly enhanced.This technique effectively blurs the script’s semantics,alters control flow characteristics,and performs well in the face of semantic analysis by ChatGPT.

Key words: Code protection, Obfuscation, Code division, Fuse, Script program

中图分类号: 

  • TP311
[1]SUDHAKAR,KUMAR S.An emerging threat Fileless mal-ware:a survey and research challenges[J].Cybersecurity,2020,3(1):1.
[2]CHAI H,YING L,DUAN H,et al.Invoke-deobfuscation:AST-based and semantics-preserving deobfuscation for PowerShell scripts[C]//2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks(DSN).IEEE,2022:295-306.
[3]MIMURA M,TAJIRI Y.Static detection of malicious Power-Shell based on word embeddings[J].Internet of Things,2021,15:100404.
[4]RUSAK G,AL-DUJAILI A,O′REILLY U M.Ast-based deep learning for detecting malicious powershell[C]//Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.2018:2276-2278.
[5]HENDLER D,KELS S,RUBIN A.Detecting malicious powershell commands using deep neural networks[C]//Proceedings of the 2018 on Asia Conference on Computer and Communications Security.2018:187-197.
[6]BLANC G,KADOBAYASHI Y.A step towards static scriptmalware abstraction:Rewriting obfuscated script with maude[J].IEICE Transactions on Information and Systems,2011,94(11):2159-2166.
[7]HERRERA A.Optimizing away javascript obfuscation[C]//2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation(SCAM).IEEE,2020:215-220.
[8]LIU W Y,FU A Y,DENG X.Exposing homograph obfuscation intentions by coloring unicode strings[C]//Asia-Pacific Web Conference.Berlin:Springer,2008:275-286.
[9]SHARIF M I,LANZI A,GIFFIN J T,et al.Impeding Malware Analysis Using Conditional Code Obfuscation[C]//NDSS.2008.
[10]FASS A,BACKES M,STOCK B.Hidenoseek:Camouflagingmalicious javascript in benign asts[C]//Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security.2019:1899-1913.
[11]BOHANNON D,HOLMES L.Revoke-obfuscation:powershell obfuscation detection using science[J/OL].https://www.blackhat.com/docs/us-17/thursday/us-17-Bohannon-Revoke-Obfuscation-PowerShell-Obfuscation-Detection-And%20Evasion-Using-Science-wp.pdf.
[12]ISMANTO R N,SALMAN M.Improving security level through obfuscation technique for source code protection using AES algorithm[C]//Proceedings of the 2017 the 7th International Conference on Communication and Network Security.2017:18-22.
[13]COLLBERG C S,THOMBORSON C.Watermarking,tamper-proofing,and obfuscation-tools for software protection[J].IEEE Transactions on Software Engineering,2002,28(8):735-746.
[14]LYNN B,PRABHAKARAN M,SAHAI A.Positive results and techniques for obfuscation[C]//International Conference on the Theory and Applications of Cryptographic Techniques.Berlin:Springer,2004:20-39.
[15]CHEN Z,JIA C,XU D.Hidden path:dynamic software watermarking based on control flow obfuscation[C]//2017 IEEE International Conference on Computational Science and Enginee-ring(CSE) and IEEE International Conference on Embedded and Ubiquitous Computing(EUC).IEEE,2017,2:443-450.
[16]XIONG X B,SHU H,KANG F.Method of diversity software protection based on fusion compilation[J].Chinese Journal of Network and Information Security,2020,6(6):13-24.
[17]YU P,SHU H,XIONG X B,et al.Implicit Code ObfuscationTechnique Based on Code Slice Fusion[J].Journal of Software,2023,34(4):1650-1665.
[18]WU T,HE S,LIU J,et al.A brief overview of ChatGPT:The history,status quo and potential future development[J].IEEE/CAA Journal of Automatica Sinica,2023,10(5):1122-1136.
[19]LIU Y,HAN T,MA S,et al.Summary of chatgpt-related research and perspective towards the future of large language models[J].Meta-Radiology,20231(2):100017.
[20]ZHOU C,LI Q,LI C,et al.A comprehensive survey on pre-trained foundation models:A history from bert to chatgpt[J].arXiv:2302.09419,2023.
[21]MCCABE T J.A complexity measure[J].IEEE Transactions on software Engineering,1976(4):308-320.
[22]ZHAO Y J,TANG Z Y,WANG N,et al.Evaluation of code obfuscating transformation[J].Journal of Software,2012,23(3):700-711.
[23]BROWN T,MANN B,RYDER N,et al.Language models arefew-shot learners[J].Advances in Neural Information Proces-sing Systems,2020,33:1877-1901.
[24]WEI J,WANG X,SCHUURMANS D,et al.Chain-of-thought prompting elicits reasoning in large language models[J].Advances in Neural Information Processing Systems,2022,35:24824-24837.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!