计算机科学 ›› 2022, Vol. 49 ›› Issue (6A): 474-479.doi: 10.11896/jsjkx.210600200

• 信息安全 • 上一篇    下一篇

Python虚拟机本地代码的安全性实证研究

蒋成满1, 华保健1, 樊淇梁1, 朱洪军2, 徐波3, 潘志中1   

  1. 1 中国科学技术大学软件学院 合肥 230000
    2 安徽信息工程学院 安徽 芜湖 241002
    3 中国科学技术大学微尺度科学国家实验室 合肥 230000
  • 出版日期:2022-06-10 发布日期:2022-06-08
  • 通讯作者: 华保健(bjhua@ustc.edu.cn)
  • 作者简介:(sa148@mail.ustc.edu.cn)
  • 基金资助:
    中国科学技术大学研究生教育创新计划项目(2020YCJC41)

Empirical Security Study of Native Code in Python Virtual Machines

JIANG Cheng-man1, HUA Bao-jian1, FAN Qi-liang1, ZHU Hong-jun2, XU Bo3, PAN Zhi-zhong1   

  1. 1 School of Software Engineering,University of Science and Technology of China,Hefei 230000,China
    2 Anhui Institute of Information Technology,Wuhu,Anhui 241002,China
    3 Hefei National Laboratory for Physical Sciences at the Microscale,University of Science and Technology of China,Hefei 230000,China
  • Online:2022-06-10 Published:2022-06-08
  • About author:JIANG Cheng-man,born in 1995,master.His main research interests include network and information security.
    HUA Bao-jian,born in 1979,Ph.D,assistant professor,graduate supervisor.His main research interests include programming language theory and implementation,computer and network security,etc.
  • Supported by:
    Graduate Education Innovation Program of USTC(2020YCJC41).

摘要: Python语言及生态是机器学习等人工智能系统的重要基础,已成为目前主流机器学习框架如TensorFlow,PyTorch,Caffe,CNTK等的首选实现语言。Python虚拟机本身的安全性和可靠性对这些机器学习框架的安全性提供了基础保障,但Python虚拟机CPython 内部包含大量由C/C++构建的本地代码,其安全漏洞模式尚未被充分研究和理解,系统的漏洞分析和修复技术也亟待研究。为此,提出了一个对Python 虚拟机本地代码的分析研究框架PyGuard,该框架使用静态程序分析技术对虚拟机中的本地代码进行安全性扫描和分析;利用该框架对Python语言的官方虚拟机CPython 进行了安全性实证研究,实验结果发现了最新版本的虚拟机(Cpython 3.9)中45个安全漏洞,表明了该框架对实际Python虚拟机本地代码安全性分析的有效性;基于该框架和安全性进行了实证分析,分析了虚拟机本地代码中的安全漏洞模式,给出了对安全漏洞的修复建议。

关键词: Python虚拟机, 安全漏洞, 本地代码, 程序分析

Abstract: The Python programming language and its echo-systems continue to play important roles in modern artificial intelligent systems like machine learning or deep learning,and are among one of the most popular implementation languages in modern machine learning infrastructures like TensorFlow,PyTorch,Caffe or CNTK.The security of the Python virtual machines is critical to the security of these machine learning systems.However,due to the existence of huge native code base in Python's CPython virtual machine,it's a great research challenge to study the security vulnerability patterns in Python virtual machines and the techniques to fix these vulnerabilities.This paper presents a novel vulnerability analysis framework PyGuard,which makes use of the static program analysis techniques to analyze the security of native code in Python virtual machines.This paper also introduces a prototype implementation of this framework and reports the experimental results of an empirical security study of the CPython virtual machine (version 3.9):we have found 45 new security vulnerabilities which demonstrates the effectiveness of this system.We have conducted a thorough study of the vulnerability patterns and given a taxonomy.

Key words: Native code, Program analysis, Python virtual machines, Security vulnerabilities

中图分类号: 

  • TP311
[1] KOOPMAN P,DEVALE J.The Exception Handling Effectiveness of POSIX Operating Systems[J].IEEE Transaction Software Engineering,2000,26(9):837-848.
[2] MILLER B P,FREDRIKSEN L,SO B.An Empirical Study of the Reliability of UNIX Utilities[J].Communications ACM,1990,33(12):32-44.
[3] NECULA G C,CONDIT J,HARREN M,et al. CCured:type-safe retrofitting of legacy software[J].ACM Transactions on Programming Languages and Systems (TOPLAS),2005,27(3):477-526.
[4] JIM T,MORRISETT J G,GROSSMAN D,et al. Cyclone:A Safe Dialect of C[C]//USENIX Annual Technical Conference.General Track,2002:275-288.
[5] WANG Z L,DING X H,PANG C B,et al.To Detect Stack Buffer Overflow with Polymorphic Canaries[C]//DSN.2018:243-254.
[6] JANG Y S,CHOI J Y.Automatic Prevention of Buffer Overflow Vulnerability Using Candidate Code Generation[J].IEICE Transactions on Information and Systems,2018,101-D(12):3005-3018.
[7] BAO T Y,GAO F J,ZHOU Y,et al.Automatic Verification of Static Buffer Overflow Alarm Based on Target Guidance Symbol Execution[J].Journal of Cyber Security,2016,1(2).
[8] REN J D,ZHENG Z Q,LIU Q,et al.A Buffer Overflow Prediction Approach Based on Software Metrics and Machine Learning[J].Security and Communication Networks,2019(1):1-13.
[9] DAHL W A,ERDODI L,ZENNARO F M.Stack-based Buffer Overflow Detection using Recurrent Neural Networks[J].arXiv:2012.15116,2020.
[10] SHAO S H,GAO Q,MA S,et al.Research Progress of Buffer Overflow Vulnerability Analysis Technology[J].Journal of Software,2018,29(5).
[11] ZHANG J,HUANG Z Q,SHEN G H,et al.C Program Memory Leak Mechanism Analysis and Detection Method Design[J].Computer Engineering & Science,2020,42(5).
[12] DUCK G J,YAP R H C.EffectiveSan:Type and Memory Error Detection using Dynamically Typed C/C++[J].arXiv:1710.06125,2017.
[13] XU H,REN W,LIU Z M,et al.Memory Error Detection Based on Dynamic Binary Translation[C]//ICCT.2020:1059-1064.
[14] LI W J,XU D P,WU W,et al.Memory access integrity:detecting fine-grained memory access errors in binary code[J].Cybersecur,2019,2(1):286-303.
[15] ZHU Y W,ZUO Z Q,WANG L Z,et al.C Program Memory Leak Intelligent Detection Method[J].Journal of Software,2019,30(5).
[16] FURR M,FOSTER J S.Checking type safety of foreign function calls[C]//PLDI.2005:62-72.
[17] FURR M,FOSTER J S.Polymorphic Type Inference for theJNI[C]//ESOP.2006:309-324.
[18] JIANG T Y,WANG P,YANG S,et al.JNI Memory Leak Detection Based on Intermediate Language[J].Journal of Computer Research and Development,2015,52(4).
[19] TAN G,APPEL A W,CHAKRADHAR S,et al.Safe Java Native Interface[J].IEEE International Symposium on Secure Software Engineering,2006:97-106.
[20] LI S L,TAN G.Finding bugs in exceptional situations of JNI programs[C]//CCS.2009:442-452.
[21] LI S L,TAN G.JET:exception checking in the Java native interface[C]//OOPSLA.2011:345-358.
[22] LI S L,TAN G.Exception analysis in the Java Native Interface[J].Science Computer Program,2014,89:273-297.
[23] TAN G,CROFT J.An Empirical Security Study of the Native Code in the JDK[J].USENIX Security Symposium,2008:365-378.
[24] LI S L,TAN G.Finding Reference-Counting Errors in Python/C Programs with Affine Analysis[C]//ECOOP.2014:80-104.
[25] MAO J J,CHEN Y,XIAO Q X,et al.RID:Finding Reference Count Bugs with Inconsistent Path Pair Checking[C]//ASPLOS.2016:531-544.
[26] HU M Z,ZHANG Y.The Python/C API:Evolution,Usage Statistics and Bug Patterns[C]//SANER.2020:532-536.
[27] TAN G,MORRISETT G.Ilea:inter-language analysis across java and c[C]//OOPSLA.2007:39-56.
[28] GHANAVATI M,COSTA D,SEBOEK J,et al.Memory and resource leak defects and their repairs in Java projects[J].Empirical Software Engineering,2020,25(1):678-718.
[29] FÜLÖP E,PATAKI N.A DSL for Resource Checking UsingFinite State Automaton-Driven Symbolic Execution[J].Open Computer Science,2021,11(1):107-115.
[30] SINGH S,KHURSHID S.Distributed Symbolic Execution using Test-Depth Partitioning[J].CoRR,abs/2106.02179.2021.
[31] YAN H,SUI Y L,CHEN S P,et al.Automated memory leakfixing on value-flow slices for C programs[C]//SAC.2016:1386-1393.
[32] ROYCHOUDHURY A,XIONG Y F.Automated program re-pair:a step towards software automation[J].Science China Information Science,2019,62(10):200103:1-200103:3.
[33] GUPTA R,PAL S,KANADE A,et al.DeepFix:Fixing Com-mon C Language Errors by Deep Learning[C]//AAAI.2017:1345-1351.
[34] SCOTT A,BADER J,CHANDRA S.Getafix:Learning to fix bugs automatically[J].CoRR,abs/1902.06111,2019.
[35] LI Y.Improving bug detection and fixing via code representation learning[C]//ICSE (Companion Volume).2020:137-139.
[1] 张潆藜, 马佳利, 刘子昂, 刘新, 周睿.
以太坊Solidity智能合约漏洞检测方法综述
Overview of Vulnerability Detection Methods for Ethereum Solidity Smart Contracts
计算机科学, 2022, 49(3): 52-61. https://doi.org/10.11896/jsjkx.210700004
[2] 李浩, 钟声, 康雁, 李涛, 张亚钏, 卜荣景.
融合领域知识的API推荐模型
API Recommendation Model with Fusion Domain Knowledge
计算机科学, 2020, 47(11A): 544-548. https://doi.org/10.11896/jsjkx.191200010
[3] 尹中旭, 张连成.
一种数据流相关过滤器自动插入的注入入侵避免方案
SQL Injection Intrusion Avoidance Scheme Based on Automatic Insertion of Dataflow-relevant Filters
计算机科学, 2019, 46(1): 201-205. https://doi.org/10.11896/j.issn.1002-137X.2019.01.031
[4] 董加星,许畅.
一种面向功能类似程序的高效克隆检测技术
Efficient Clone Detection Technique for Functionally Similar Programs
计算机科学, 2017, 44(4): 12-15. https://doi.org/10.11896/j.issn.1002-137X.2017.04.003
[5] 张弛,黄志球,丁泽文.
支持抽象解释的静态分析方法的形式化体系研究
Research on Static Analysis Formalism Supporting Abstract Interpretation
计算机科学, 2017, 44(12): 126-130. https://doi.org/10.11896/j.issn.1002-137X.2017.12.025
[6] 张杨,张冬雯,仇晶.
面向Java锁机制的字节码自动重构框架
Automated Refactoring Framework for Java Locks
计算机科学, 2015, 42(11): 84-89. https://doi.org/10.11896/j.issn.1002-137X.2015.11.017
[7] 张海博,安虹,贺松涛,孙涛,王涛,彭毅,程亦超.
程序阶段性分析和阶段检测技术
Program Phase Analysis and Phase Detection Techniques
计算机科学, 2015, 42(1): 71-74. https://doi.org/10.11896/j.issn.1002-137X.2015.01.016
[8] 李林.卢显良.
一种针对规则集不一致性的测试数据包选取算法
Test Pac(}ets Choice Algorithm Aiming at Filter Conflicts
计算机科学, 2011, 38(9): 71-75.
[9] 闫洁,徐恒阳,安虹,刘玉,王耀彬.
Pview:一种基于PMU的支持并行程序性能分析的新方法
Pview: A Novel Implementation of Fundamental Supports for Parallel Programs Performance Monitoring Based on PMU
计算机科学, 2011, 38(2): 288-292.
[10] 徐建军,谭庆平.
寄存器软错误对程序可靠性影响的静态分析
Static Analysis of Soft Errors Effect in Register Files for Program Reliability
计算机科学, 2011, 38(1): 290-294.
[11] 叶俊民,谢茜,金聪,李明,张振方.
一种基于模型检验程序分析技术的前端工具研究
Research on a Front-end Tool for Program Analysis Based on Model Checking
计算机科学, 2010, 37(5): 118-122174.
[12] .
基于市场占有率的操作系统安全漏洞检测模型

计算机科学, 2009, 36(4): 159-162.
[13] .
基于复杂网络的Java程序分析工具设计与实现

计算机科学, 2009, 36(4): 145-150.
[14] .
基于SAT求解的面向对象程序类型分析

计算机科学, 2009, 36(1): 256-262.
[15] .
一个异常传播分析工具的设计与实现

计算机科学, 2008, 35(7): 277-279.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!