Computer Science ›› 2024, Vol. 51 ›› Issue (7): 389-396.doi: 10.11896/jsjkx.230300117

• Information Security • Previous Articles     Next Articles

PDF Malicious Indicators Extraction Technique Based on Improved Symbolic Execution

SONG Enzhou, HU Tao, YI Peng, WANG Wenbo   

  1. National Digital Switching System Engineering Technological R&D Center,Zhengzhou 450001,China
  • Received:2023-03-14 Revised:2023-07-03 Online:2024-07-15 Published:2024-07-10
  • About author:SONG Enzhou,born in 1998,postgra-duate.His main research interests include malicious document analysis,binary code analysis and vulnerability mining.
    HU Tao,born in 1993,Ph.D,assistant researcher.His main research interests include intrinsic security,intrusion detection and SDN.
  • Supported by:
    National Natural Science Foundation of China(62176264).

Abstract: The malicious PDF document is a common attack method used by APT organizations.Analyzing extracted indicators of embedded JavaScript code is an important means to determine the maliciousness of the documents.However,attackers can adopt high obfuscation,sandbox detection and other escape methods to interfere with analysis.Therefore,this paper innovatively applies symbolic execution method to PDF indicator extraction.We propose a PDF malicious indicator extraction technique based on improved symbolic execution and implement SYMBPDF,an indicator extraction system consisting of three modules:code parsing,symbolic execution and indicator extraction.In the code parsing module,we implement extraction and reorganization of inline Javascript code.In the symbolic execution module,we design the code rewriting method to force branch shifting,resulting in improving the code coverage of symbolic execution.We also design a concurrency strategy and two constraint solving optimization methods to improve the efficiency.In the indicator extraction module,we realize integration and recording of malicious indicators.In this paper,1 271 malicious samples are extracted and evaluated.The success rate of indicator extraction is 92.2%,the indicator effectiveness is 91.7%,the code coverage is 8.5% higher and the system performance is 32.3% higher than that of before optimization.

Key words: Malicious documents, JavaScript code, Indicator extraction, Symbolic execution, Code rewriting, Constraint solving optimization

CLC Number: 

  • TP311
[1]LEI J W,YI P,CHEN X,et al.PDF document detection model based on system calls and data provenance[J].Journal of Computer Applications,2022,42(12):3831-3840.
[2]LU X,WANG F,JIANG C,et al.A Universal Malicious Documents Static Detection Framework Based on Feature Generalization[J].Applied Sciences,2021,11(24):12134.
[3]NISSIM N,COHEN A,MOSKOVITCH R,et al.ALPD:Active Learning Framework for Enhancing the Detection of Malicious PDF Files[C]//2014 IEEE Joint Intelligence and Security Informatics Conference.Washington DC,USA:IEEE,2014:91-98.
[4]NISSIM N,COHEN A,GLEZER C,et al.Detection of Malicious PDF Files and Directions for Enhancements:A State-of-the Art Survey[J].Computers & Security,2015,48:246-266.
[5]YU M,JIANG J G,LI G,et al.A Survey of Research on Malicious Document Detection[J].Journal of Cyber Security,2021,6(3):54-76.
[6]WANG Y.The De-Obfuscation Method in the Static Detection of Malicious PDF Documents[C]//2021 7th Annual International Conference on Network and Information Systems for Computers.Guiyang,China:ICNISC,2021:44-47.
[7]CHEN K,WANG P,YEONJOON L,et al.Scalable Detection of Unknown Malware from Millions of Apps[J].Journal of Cyber Security,2016,1(1):24-38.
[8]GAO X,YU M,JIANG J G,et al.A Combined Malicious Documents Detecting Method Based on Emulators[J].Applied Mechanics and Materials,2014(602/603/604/605):1707-1712.
[9]FENG D,YU M,WANG Y.Detecting Malicious PDF FilesUsing Semi-Supervised Learning Method[C]//The 5th International Conference on Advanced Computer Science Applications and Technologies.Beijing,China:ACSAT,2017:135-155.
[10]ANDREASEN E,LIANG G,MØLLER A,et al.A survey of dynamic analysis and test generation for JavaScript[J].ACM Computing Surveys,2017,50(5):1-36.
[11]SIHWAIL R,OMAR,K,ZAINOL A,et al.Malware detection approach based on artifacts in memory image and dynamic ana-lysis[J].Applied Sciences,2019,9(18):3680-3691.
[12]ALAZAB A,KHRAISAT A,ALAZAB M,et al.Detection of Obfuscated Malicious JavaScript Code[J].Future Internet,2022,14(8):217-231.
[13]TZERMIAS Z,SYKIOTAKIS G,POLYCHRONAKIS M,et al.Combining Static and Dynamic Analysis for the Detection of Malicious Documents[C]//The Fourth European Workshop on System Security.New York,USA:EUROSEC,2011:1-6.
[14]CORONA I,MAIORCA D,ARIU D,et al.Lux0R:Detection of Malicious PDF-Embedded JavaScript Code through Discriminant Analysis of API References[C]//The 2014 Workshop on Artificial Intelligent and Security Workshop.New York,NY:ACM,2014:47-57.
[15]RUARO N,PAGANI F,ORTOLANI S,et al.SYMBEXCEL:Automated Analysis and Understanding of Malicious Excel 4.0 Macros[C]//2022 IEEE Symposium on Security and Privacy.San Francisco,CA:IEEE,2022:1066-1081.
[16]ISO32000-1:2020[EB/OL].https://www.pdfa.org/resource/iso-32000-pdf/.
[17]MAIORCA D,GIACINTO G,CORONA I.A Pattern Recognition System for Malicious PDFFiles Detection[C]//Interna-tional Workshop on Machine Learning and Data Mining in Pattern Recognition.Berlin:Springer,2012:510-524.
[18]LIN J Y,PAO H K.Multi-View Malicious Document Detection[C]//2013 Conference on Technologies and Applications of Artificial Intelligence.TAAI,2013:170-175.
[19]SUN B Y.Research on The PDF Document Security Detection Methods[D].Shanghai:Shanghai Jiao Tong University,2015.
[20]WANG T,MOU Z H,ZHANG Z H,et al.Detecting Obfuscated Malicious JavaScript Code Based on Function Call Information[J].Computer Simulation,2021,38(2):432-437.
[21]NDICHU S,KIM S,OZAWA S.Deobfuscation,unpacking,and decoding of obfuscated malicious JavaScript for machine learning models detectionperformance improvement[J].CAAI Transactions on Intelligence Technology,2020,5(3):184-192.
[22]FRAIWAN M,AL-SALMAN R,KHASAWNEH N,et al.Analysis and identifification of malicious javascript code[J].Information Security Journal:A Global Perspective,2012,21(1):1-11.
[23]LASKOV P,ŠRNDIĆ N.Static Detection of Malicious Java-Script-Bearing PDF Documents[C]//Proceedings of the 27th Annual Computer Security Applications Conference.New York,NY:ACM,2011:373-382.
[24]LI M,ZHOU Y,YU M,et al.Combining Static and DynamicAnalysis for the Detection of Malicious JavaScript-Bearing PDF Documents[C]//Proceedings of the 2016 International Confe-rence on Computer Science,Technology and Application.Shen-zhen,China:ICCITA,2017:475-482.
[25]LU X,ZHUGE J W,WANG R Y,et al.De-Obfuscation and Detection of Malicious PDF Files with High Accuracy[C]//2013 46th Hawaii International Conference on System Sciences.Wailea,Maui,USA:HICSS,2013:4890-4899.
[26]COVA M,KRUEGEL C,VIGNA G.Detection and analysis of drive-by-download attacks and malicious javascript code[C]//Proceedings of the 19th International Conference on World Wide Web.New York,NY:ACM,2010:281-290.
[27]MA H L,WANG W,HAN Z.Detecting and De-ObfuscationObfuscated Malicious JavaScript Code[J].Chinese Journal of Computers,2017,40(7):1699-1713.
[28]HU X,CHENG Y,DUAN Y,et al.JSForce:A Forced Execution Engine for Malicious JavaScript Detection[C]//Security and Privacy in Communication Networks:14th International Conference.Singapore:Springer,2018:704-720.
[1] WANG Yufang, LE Deguang, Jack TAN, XIAO Le, GONG Shengrong. Opaque Predicate Construction Algorithm Without Size Constraints [J]. Computer Science, 2023, 50(8): 352-358.
[2] ZHOU Sheng-yi, ZENG Hong-wei. Program Complexity Analysis Method Combining Evolutionary Algorithm with Symbolic Execution [J]. Computer Science, 2021, 48(12): 107-116.
[3] HUANG Zhao,HUANG Shu-guang,DENG Zhao-kun,HUANG Hui. Automatic Vulnerability Detection and Test Cases Generation Method for Vulnerabilities Caused by SEH [J]. Computer Science, 2019, 46(7): 133-138.
[4] FANG Hao, WU Li-fa, WU Zhi-yong. Automatic Return-to-dl-resolve Exploit Generation Method Based on Symbolic Execution [J]. Computer Science, 2019, 46(2): 127-132.
[5] YE Zhi-bin,YAN Bo. Survey of Symbolic Execution [J]. Computer Science, 2018, 45(6A): 28-35.
[6] LI Hang, ZANG Lie, GAN Lu. Search of Speculative Symbolic Execution Path Based on Ant Colony Algorithm [J]. Computer Science, 2018, 45(6): 145-150.
[7] ZHANG Jing, ZHOU An-min, LIU Liang, JIA Peng and LIU Lu-ping. Review of Crash Exploitability Analysis Methods [J]. Computer Science, 2018, 45(5): 5-14.
[8] DENG Wei and LI Zhao-peng. State Merging for Symbolic Execution Engine with Shape Analysis [J]. Computer Science, 2017, 44(2): 209-215.
[9] CHEN Yong and XU Chao. Symbolic Execution and Human-Machine Interaction Based Auto Vectorization Method [J]. Computer Science, 2016, 43(Z6): 461-466.
[10] LIANG Jia-biao, LI Zhao-peng, ZHU Ling and SHEN Xian-fei. Symbolic Execution Engine with Shape Analysis [J]. Computer Science, 2016, 43(3): 193-198.
[11] LI Hua, XING Yi and ZHANG Yu-rong. Modeling OpenStack Single Plane Network Based on Token Selection [J]. Computer Science, 2016, 43(11): 66-70.
[12] WANG Zhi-wen,HUANG Xiao-long,WANG Hai-jun,LIU Ting and YU Le-chen. Program Slicing-guied Test Case Generation System [J]. Computer Science, 2014, 41(9): 71-74.
[13] ZHANG Ya-jun,LI Zhou-jun,LIAO Xiang-ke,JIANG Rui-cheng and LI Hai-feng. Survey of Automated Whitebox Fuzz Testing [J]. Computer Science, 2014, 41(2): 7-10.
[14] CHEN Shu,YE Jun-min and ZHANG Fan. Automatic Program Testing with Dynamic Symbolic Execution and Model Learning [J]. Computer Science, 2013, 40(8): 161-164.
[15] CHEN Xiang,GU Qing and CHEN Dao-xu. Research Advances in Test Suite Augmentation for Regression Testing [J]. Computer Science, 2013, 40(6): 8-15.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!