计算机科学 ›› 2024, Vol. 51 ›› Issue (7): 389-396.doi: 10.11896/jsjkx.230300117
宋恩舟, 胡涛, 伊鹏, 王文博
SONG Enzhou, HU Tao, YI Peng, WANG Wenbo
摘要: 恶意PDF文档是APT组织常用的攻击方法,提取分析其内嵌JavaScript代码指标是判定文档恶意性的重要手段,然而攻击者可以采取高度混淆、虚拟机与沙箱检测等逃逸方法。因此,文中创新性地将符号执行方法用于PDF指标提取,提出了一种基于符号执行优化的PDF恶意指标提取技术,并实现了由代码解析、符号执行和指标提取3个模块组成的指标提取系统SYMBPDF。在代码解析模块中实现内嵌JavaScript代码提取与重组。在符号执行模块中设计代码改写方法,通过强制分支转移提高符号执行的代码覆盖率;设计并发策略和两种约束求解优化方法,以提高系统执行效率。在指标提取模块中实现恶意指标整合与记录。对1 271个恶意样本进行了指标提取与评估,指标提取成功率为92.2%,有效性为91.7%,代码覆盖率较优化前提升8.5%,系统性能较优化前提升32.3%。
中图分类号:
[1]LEI J W,YI P,CHEN X,et al.PDF document detection model based on system calls and data provenance[J].Journal of Computer Applications,2022,42(12):3831-3840. [2]LU X,WANG F,JIANG C,et al.A Universal Malicious Documents Static Detection Framework Based on Feature Generalization[J].Applied Sciences,2021,11(24):12134. [3]NISSIM N,COHEN A,MOSKOVITCH R,et al.ALPD:Active Learning Framework for Enhancing the Detection of Malicious PDF Files[C]//2014 IEEE Joint Intelligence and Security Informatics Conference.Washington DC,USA:IEEE,2014:91-98. [4]NISSIM N,COHEN A,GLEZER C,et al.Detection of Malicious PDF Files and Directions for Enhancements:A State-of-the Art Survey[J].Computers & Security,2015,48:246-266. [5]YU M,JIANG J G,LI G,et al.A Survey of Research on Malicious Document Detection[J].Journal of Cyber Security,2021,6(3):54-76. [6]WANG Y.The De-Obfuscation Method in the Static Detection of Malicious PDF Documents[C]//2021 7th Annual International Conference on Network and Information Systems for Computers.Guiyang,China:ICNISC,2021:44-47. [7]CHEN K,WANG P,YEONJOON L,et al.Scalable Detection of Unknown Malware from Millions of Apps[J].Journal of Cyber Security,2016,1(1):24-38. [8]GAO X,YU M,JIANG J G,et al.A Combined Malicious Documents Detecting Method Based on Emulators[J].Applied Mechanics and Materials,2014(602/603/604/605):1707-1712. [9]FENG D,YU M,WANG Y.Detecting Malicious PDF FilesUsing Semi-Supervised Learning Method[C]//The 5th International Conference on Advanced Computer Science Applications and Technologies.Beijing,China:ACSAT,2017:135-155. [10]ANDREASEN E,LIANG G,MØLLER A,et al.A survey of dynamic analysis and test generation for JavaScript[J].ACM Computing Surveys,2017,50(5):1-36. [11]SIHWAIL R,OMAR,K,ZAINOL A,et al.Malware detection approach based on artifacts in memory image and dynamic ana-lysis[J].Applied Sciences,2019,9(18):3680-3691. [12]ALAZAB A,KHRAISAT A,ALAZAB M,et al.Detection of Obfuscated Malicious JavaScript Code[J].Future Internet,2022,14(8):217-231. [13]TZERMIAS Z,SYKIOTAKIS G,POLYCHRONAKIS M,et al.Combining Static and Dynamic Analysis for the Detection of Malicious Documents[C]//The Fourth European Workshop on System Security.New York,USA:EUROSEC,2011:1-6. [14]CORONA I,MAIORCA D,ARIU D,et al.Lux0R:Detection of Malicious PDF-Embedded JavaScript Code through Discriminant Analysis of API References[C]//The 2014 Workshop on Artificial Intelligent and Security Workshop.New York,NY:ACM,2014:47-57. [15]RUARO N,PAGANI F,ORTOLANI S,et al.SYMBEXCEL:Automated Analysis and Understanding of Malicious Excel 4.0 Macros[C]//2022 IEEE Symposium on Security and Privacy.San Francisco,CA:IEEE,2022:1066-1081. [16]ISO32000-1:2020[EB/OL].https://www.pdfa.org/resource/iso-32000-pdf/. [17]MAIORCA D,GIACINTO G,CORONA I.A Pattern Recognition System for Malicious PDFFiles Detection[C]//Interna-tional Workshop on Machine Learning and Data Mining in Pattern Recognition.Berlin:Springer,2012:510-524. [18]LIN J Y,PAO H K.Multi-View Malicious Document Detection[C]//2013 Conference on Technologies and Applications of Artificial Intelligence.TAAI,2013:170-175. [19]SUN B Y.Research on The PDF Document Security Detection Methods[D].Shanghai:Shanghai Jiao Tong University,2015. [20]WANG T,MOU Z H,ZHANG Z H,et al.Detecting Obfuscated Malicious JavaScript Code Based on Function Call Information[J].Computer Simulation,2021,38(2):432-437. [21]NDICHU S,KIM S,OZAWA S.Deobfuscation,unpacking,and decoding of obfuscated malicious JavaScript for machine learning models detectionperformance improvement[J].CAAI Transactions on Intelligence Technology,2020,5(3):184-192. [22]FRAIWAN M,AL-SALMAN R,KHASAWNEH N,et al.Analysis and identifification of malicious javascript code[J].Information Security Journal:A Global Perspective,2012,21(1):1-11. [23]LASKOV P,ŠRNDIĆ N.Static Detection of Malicious Java-Script-Bearing PDF Documents[C]//Proceedings of the 27th Annual Computer Security Applications Conference.New York,NY:ACM,2011:373-382. [24]LI M,ZHOU Y,YU M,et al.Combining Static and DynamicAnalysis for the Detection of Malicious JavaScript-Bearing PDF Documents[C]//Proceedings of the 2016 International Confe-rence on Computer Science,Technology and Application.Shen-zhen,China:ICCITA,2017:475-482. [25]LU X,ZHUGE J W,WANG R Y,et al.De-Obfuscation and Detection of Malicious PDF Files with High Accuracy[C]//2013 46th Hawaii International Conference on System Sciences.Wailea,Maui,USA:HICSS,2013:4890-4899. [26]COVA M,KRUEGEL C,VIGNA G.Detection and analysis of drive-by-download attacks and malicious javascript code[C]//Proceedings of the 19th International Conference on World Wide Web.New York,NY:ACM,2010:281-290. [27]MA H L,WANG W,HAN Z.Detecting and De-ObfuscationObfuscated Malicious JavaScript Code[J].Chinese Journal of Computers,2017,40(7):1699-1713. [28]HU X,CHENG Y,DUAN Y,et al.JSForce:A Forced Execution Engine for Malicious JavaScript Detection[C]//Security and Privacy in Communication Networks:14th International Conference.Singapore:Springer,2018:704-720. |
|