Computer Science ›› 2022, Vol. 49 ›› Issue (6A): 474-479.doi: 10.11896/jsjkx.210600200

• Information Security • Previous Articles     Next Articles

Empirical Security Study of Native Code in Python Virtual Machines

JIANG Cheng-man1, HUA Bao-jian1, FAN Qi-liang1, ZHU Hong-jun2, XU Bo3, PAN Zhi-zhong1   

  1. 1 School of Software Engineering,University of Science and Technology of China,Hefei 230000,China
    2 Anhui Institute of Information Technology,Wuhu,Anhui 241002,China
    3 Hefei National Laboratory for Physical Sciences at the Microscale,University of Science and Technology of China,Hefei 230000,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:JIANG Cheng-man,born in 1995,master.His main research interests include network and information security.
    HUA Bao-jian,born in 1979,Ph.D,assistant professor,graduate supervisor.His main research interests include programming language theory and implementation,computer and network security,etc.
  • Supported by:
    Graduate Education Innovation Program of USTC(2020YCJC41).

Abstract: The Python programming language and its echo-systems continue to play important roles in modern artificial intelligent systems like machine learning or deep learning,and are among one of the most popular implementation languages in modern machine learning infrastructures like TensorFlow,PyTorch,Caffe or CNTK.The security of the Python virtual machines is critical to the security of these machine learning systems.However,due to the existence of huge native code base in Python's CPython virtual machine,it's a great research challenge to study the security vulnerability patterns in Python virtual machines and the techniques to fix these vulnerabilities.This paper presents a novel vulnerability analysis framework PyGuard,which makes use of the static program analysis techniques to analyze the security of native code in Python virtual machines.This paper also introduces a prototype implementation of this framework and reports the experimental results of an empirical security study of the CPython virtual machine (version 3.9):we have found 45 new security vulnerabilities which demonstrates the effectiveness of this system.We have conducted a thorough study of the vulnerability patterns and given a taxonomy.

Key words: Native code, Program analysis, Python virtual machines, Security vulnerabilities

CLC Number: 

  • TP311
[1] KOOPMAN P,DEVALE J.The Exception Handling Effectiveness of POSIX Operating Systems[J].IEEE Transaction Software Engineering,2000,26(9):837-848.
[2] MILLER B P,FREDRIKSEN L,SO B.An Empirical Study of the Reliability of UNIX Utilities[J].Communications ACM,1990,33(12):32-44.
[3] NECULA G C,CONDIT J,HARREN M,et al. CCured:type-safe retrofitting of legacy software[J].ACM Transactions on Programming Languages and Systems (TOPLAS),2005,27(3):477-526.
[4] JIM T,MORRISETT J G,GROSSMAN D,et al. Cyclone:A Safe Dialect of C[C]//USENIX Annual Technical Conference.General Track,2002:275-288.
[5] WANG Z L,DING X H,PANG C B,et al.To Detect Stack Buffer Overflow with Polymorphic Canaries[C]//DSN.2018:243-254.
[6] JANG Y S,CHOI J Y.Automatic Prevention of Buffer Overflow Vulnerability Using Candidate Code Generation[J].IEICE Transactions on Information and Systems,2018,101-D(12):3005-3018.
[7] BAO T Y,GAO F J,ZHOU Y,et al.Automatic Verification of Static Buffer Overflow Alarm Based on Target Guidance Symbol Execution[J].Journal of Cyber Security,2016,1(2).
[8] REN J D,ZHENG Z Q,LIU Q,et al.A Buffer Overflow Prediction Approach Based on Software Metrics and Machine Learning[J].Security and Communication Networks,2019(1):1-13.
[9] DAHL W A,ERDODI L,ZENNARO F M.Stack-based Buffer Overflow Detection using Recurrent Neural Networks[J].arXiv:2012.15116,2020.
[10] SHAO S H,GAO Q,MA S,et al.Research Progress of Buffer Overflow Vulnerability Analysis Technology[J].Journal of Software,2018,29(5).
[11] ZHANG J,HUANG Z Q,SHEN G H,et al.C Program Memory Leak Mechanism Analysis and Detection Method Design[J].Computer Engineering & Science,2020,42(5).
[12] DUCK G J,YAP R H C.EffectiveSan:Type and Memory Error Detection using Dynamically Typed C/C++[J].arXiv:1710.06125,2017.
[13] XU H,REN W,LIU Z M,et al.Memory Error Detection Based on Dynamic Binary Translation[C]//ICCT.2020:1059-1064.
[14] LI W J,XU D P,WU W,et al.Memory access integrity:detecting fine-grained memory access errors in binary code[J].Cybersecur,2019,2(1):286-303.
[15] ZHU Y W,ZUO Z Q,WANG L Z,et al.C Program Memory Leak Intelligent Detection Method[J].Journal of Software,2019,30(5).
[16] FURR M,FOSTER J S.Checking type safety of foreign function calls[C]//PLDI.2005:62-72.
[17] FURR M,FOSTER J S.Polymorphic Type Inference for theJNI[C]//ESOP.2006:309-324.
[18] JIANG T Y,WANG P,YANG S,et al.JNI Memory Leak Detection Based on Intermediate Language[J].Journal of Computer Research and Development,2015,52(4).
[19] TAN G,APPEL A W,CHAKRADHAR S,et al.Safe Java Native Interface[J].IEEE International Symposium on Secure Software Engineering,2006:97-106.
[20] LI S L,TAN G.Finding bugs in exceptional situations of JNI programs[C]//CCS.2009:442-452.
[21] LI S L,TAN G.JET:exception checking in the Java native interface[C]//OOPSLA.2011:345-358.
[22] LI S L,TAN G.Exception analysis in the Java Native Interface[J].Science Computer Program,2014,89:273-297.
[23] TAN G,CROFT J.An Empirical Security Study of the Native Code in the JDK[J].USENIX Security Symposium,2008:365-378.
[24] LI S L,TAN G.Finding Reference-Counting Errors in Python/C Programs with Affine Analysis[C]//ECOOP.2014:80-104.
[25] MAO J J,CHEN Y,XIAO Q X,et al.RID:Finding Reference Count Bugs with Inconsistent Path Pair Checking[C]//ASPLOS.2016:531-544.
[26] HU M Z,ZHANG Y.The Python/C API:Evolution,Usage Statistics and Bug Patterns[C]//SANER.2020:532-536.
[27] TAN G,MORRISETT G.Ilea:inter-language analysis across java and c[C]//OOPSLA.2007:39-56.
[28] GHANAVATI M,COSTA D,SEBOEK J,et al.Memory and resource leak defects and their repairs in Java projects[J].Empirical Software Engineering,2020,25(1):678-718.
[29] FÜLÖP E,PATAKI N.A DSL for Resource Checking UsingFinite State Automaton-Driven Symbolic Execution[J].Open Computer Science,2021,11(1):107-115.
[30] SINGH S,KHURSHID S.Distributed Symbolic Execution using Test-Depth Partitioning[J].CoRR,abs/2106.02179.2021.
[31] YAN H,SUI Y L,CHEN S P,et al.Automated memory leakfixing on value-flow slices for C programs[C]//SAC.2016:1386-1393.
[32] ROYCHOUDHURY A,XIONG Y F.Automated program re-pair:a step towards software automation[J].Science China Information Science,2019,62(10):200103:1-200103:3.
[33] GUPTA R,PAL S,KANADE A,et al.DeepFix:Fixing Com-mon C Language Errors by Deep Learning[C]//AAAI.2017:1345-1351.
[34] SCOTT A,BADER J,CHANDRA S.Getafix:Learning to fix bugs automatically[J].CoRR,abs/1902.06111,2019.
[35] LI Y.Improving bug detection and fixing via code representation learning[C]//ICSE (Companion Volume).2020:137-139.
[1] LI Hao, ZHONG Sheng, KANG Yan, LI Tao, ZHANG Ya-chuan, BU Rong-jing. API Recommendation Model with Fusion Domain Knowledge [J]. Computer Science, 2020, 47(11A): 544-548.
[2] YIN Zhong-xu, ZHANG Lian-cheng. SQL Injection Intrusion Avoidance Scheme Based on Automatic Insertion of Dataflow-relevant Filters [J]. Computer Science, 2019, 46(1): 201-205.
[3] DONG Jia-xing and XU Chang. Efficient Clone Detection Technique for Functionally Similar Programs [J]. Computer Science, 2017, 44(4): 12-15.
[4] LIU Yan-na, CHEN Li and TANG Sheng-lin. Error Checking Tool for DAG-based Task Parallel Programs [J]. Computer Science, 2017, 44(3): 38-41.
[5] ZHANG Chi, HUANG Zhiqiu and DING Zewen. Research on Static Analysis Formalism Supporting Abstract Interpretation [J]. Computer Science, 2017, 44(12): 126-130.
[6] ZHANG Yang, ZHANG Dong-wen and QIU Jing. Automated Refactoring Framework for Java Locks [J]. Computer Science, 2015, 42(11): 84-89.
[7] ZHANG Hai-bo, AN Hong, HE Song-tao, SUN Tao, WANG Tao, PENG Yi and CHENG Yi-chao. Program Phase Analysis and Phase Detection Techniques [J]. Computer Science, 2015, 42(1): 71-74.
[8] LI Lin, LU Xian-Liang. Test Pac(}ets Choice Algorithm Aiming at Filter Conflicts [J]. Computer Science, 2011, 38(9): 71-75.
[9] YANG Jie,XU Heng-yang,AN Hong,LIU Yu,WANG Yao-bin. Pview: A Novel Implementation of Fundamental Supports for Parallel Programs Performance Monitoring Based on PMU [J]. Computer Science, 2011, 38(2): 288-292.
[10] XU Jian-jun,TAN Qing-ping. Static Analysis of Soft Errors Effect in Register Files for Program Reliability [J]. Computer Science, 2011, 38(1): 290-294.
[11] YE Jun-min,XIE Qian,JIN Cong,LI Ming,ZHANG Zhen-fang. Research on a Front-end Tool for Program Analysis Based on Model Checking [J]. Computer Science, 2010, 37(5): 118-122174.
[12] . [J]. Computer Science, 2009, 36(5): 124-128.
[13] . [J]. Computer Science, 2009, 36(4): 145-150.
[14] . [J]. Computer Science, 2009, 36(1): 256-262.
[15] . [J]. Computer Science, 2008, 35(7): 277-279.
Full text



No Suggested Reading articles found!