计算机科学 ›› 2023, Vol. 50 ›› Issue (7): 27-37.doi: 10.11896/jsjkx.221100244

• 计算机软件 • 上一篇    下一篇

基于规则的高风险动态类型代码检测研究

陈芝菲1, 郝洋1, 陈林2, 肖亮1   

  1. 1 南京理工大学计算机科学与工程学院 南京 210094
    2 南京大学计算机软件新技术国家重点实验室 南京 210023
  • 收稿日期:2022-11-29 修回日期:2023-03-09 出版日期:2023-07-15 发布日期:2023-07-05
  • 通讯作者: 陈芝菲(chenzhifei@njust.edu.cn)
  • 基金资助:
    国家重点研发计划(2022YFF0712100);江苏省自然科学基金(BK20220937);江苏省研究生科研与实践创新计划项目(KYCX22_0465)

Rule-based Technique for Detecting Risky Dynamic Typing Code

CHEN Zhifei1, HAO Yang1, CHEN Lin2, XIAO Liang1   

  1. 1 School of Computer Science and Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
    2 State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing 210023,China
  • Received:2022-11-29 Revised:2023-03-09 Online:2023-07-15 Published:2023-07-05
  • About author:CHEN Zhifei,born in 1990,Ph.D,associate professor.Her main research interests include program analysis,software testing,and software maintenance.
  • Supported by:
    National Key Research and Development Program of China(2022YFF0712100),Natural Science Foundation of Jiangsu Province,China(BK20220937) and Postgraduate Research & Practice Innovation Program of Jiangsu Province(KYCX22_0465).

摘要: 近年来,Python的应用呈爆炸式增长。虽然Python的动态类型特性为开发人员提供了强大的编程抽象能力,但同样也导致代码库中聚集了与类型相关的缺陷。为了减少软件代码中的类型缺陷,文中分析并检测了6种可能导致类型缺陷的高风险动态类型代码。首先,形式化地描述了每种类型的高风险动态类型代码的规则;然后,提出了一种基于规则的高风险动态类型代码检测技术;最后,对25个Python开源软件项目(总规模超过945kLOC)展开了实验评估。结果表明,高风险动态类型代码在开源软件项目中广泛存在,尤其是单个Python函数中可能存在多处变量类型不一致的代码,而基于规则的检测技术在Python软件项目中具有较高的准确率和较好的性能表现。针对高风险动态类型代码的检测技术及实验结论,将为动态类型特性的良性发展以及软件项目的质量保障提供有力的参考和支持。

关键词: Python, 动态类型, 类型缺陷, 开源软件, 高风险动态类型代码, 检测技术

Abstract: Python has experienced an explosive growth in application in recent years,especially in scientific computing and machine learning areas.While Python’s dynamically-typed nature provides developers with powerful programming abstractions,that same dynamic type system allows for type-related defects to accumulate in code bases.To address these issues,this paper investigates six types of risky dynamic typing code which could incur type-related defects.This paper formally describes the rule of each type of risky dynamic typing code and then proposes a rule-based technique for detecting them.We conduct a study on 25 real-world Python open-source software projects (with the total size of more than 945kLOC).The results show that risky dynamic typing code is widespread in open-source software projects and a single Python method could gather multiple instances of inconsistent variable types,and the rule-based detection technique achieves high detection accuracy and good performance.The technique for detecting risky dynamic typing code and findings from this work will provide a strong reference and support for healthy evolvement of dynamic typing feature and quality assurance of software projects.

Key words: Python, Dynamic typing, Type defects, Open-source software, Risky dynamic typing code, Detection technique

中图分类号: 

  • TP311
[1]KLEINSCHMAGER S,ROBBES R,STEFIK A,et al.Do static type systemsimprove the maintainability of software systems?An empirical study[C]//Proceedings of the 20th IEEE International Conference on Program Comprehension.2012:153-162.
[2]ZHANG X F,ZHU C.Empirical study of code smell impact on software evolution[J].Journal of Software,2019,30(5):1422-1437.
[3]MAYER C,HANENBERG S,ROBBES R,et al.An empirical study of the influence of static type systems on the usability of undocumented software[J].ACMSIGPLAN Notices,2012,47(10):683-702.
[4]GAO Z,BIRD C,BARR E T.To type or not to type;Quanti-fying detectable bugs in JavaScript[C]//Proceedings of the 39th International Conference on Software Engineering.IEEE,2017:758-769.
[5]MEYEROVICH L A,RABKIN A S.Empirical analysis of programming language adoption[C]//Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications.2013:1-18.
[6]RAY B,POSNETT D,FILKOV V,et al.A large scale study of programming languages and code quality in github[C]//Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York,NY,USA:Association for Computing Machinery,2014:155-165.
[7]XU Z G,LIU P,ZHANG X Y,et al.Python predictive analysis for bug detection[C]//Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Enginee-ring.New York,NY,USA:Association for Computing Machinery,2016:121-132.
[8]TIAN Y C,LI K J,WANG T M,et al.Survey on code smells[J].Journal of Software,2023,34(1):150-170.
[9]CHEN Z F,LI Y H,CHEN B H,et al.An empirical study on dynamic typing related practices in Python systems[C]//Proceedings of the 28th International Conference on Program Comprehension.New York,NY,USA:Association for Computing Machinery,2020:83-93.
[10]PRADEL M,SEN K.The good,the bad,and the ugly;An empirical study of implicit type conversions in JavaScript[C]//Proceedings of the 29th European Conference on Object-Oriented Programming(ECOOP).Dagstuhl,Germany:Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,2015:519-541.
[11]PRADEL M,SCHUH P,SEN K.TypeDevil:Dynamic type inconsistency analysis for JavaScript[C]//Proceedings of the 37th IEEE International Conference on Software Engineering.2015:314-324.
[12]CALLAÚ O,ROBBES R,TANTER É,et al.How(and why) developers use the dynamic features of programming languages;The case of Smalltalk[J].Empirical Software Engineering,2013,18(6):1156-1194.
[13]RICHARDS G,LEBRESNE S,BURG B,et al.An analysis of the dynamic behavior of JavaScript programs[C]//Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation.New York,NY,USA,Association for Computing Machinery,2010:1-12.
[14]DUFOUR B,GOARD C,HENDREN L,et al.Measuring thedynamic behaviour of AspectJ programs[C]//Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-oriented Programming,Systems,Languages,and Applications.New York,NY,USA,Association for Computing Machinery,2004:150-169.
[15]HOLKNER A,HARLAND J.Evaluating the dynamic behaviour of Python applications[C]//Proceedings of the Thirty-Second Australasian Conference on Computer Science-Volume 91.AUS,Australian ComputerSociety,Inc.,2009:19-28.
[16]ÅKERBLOM B,STENDAHL J,TUMLIN M,et al.Tracing dynamic features inPython programs[C]//Proceedings of the 11th Working Conference on Mining Software Repositories.New York,NY,USA,Association for Computing Machinery,2014:292-295.
[17]PENG Y,ZHANG Y,HU M Z.An Empirical study for common language features used in Python projects[C]//Proceedings of IEEE International Conference on Software Analysis,Evolution and Reengineering(SANER).2021:24-35.
[18]JIANG C M,HUA B J,FAN Q L,et al.Empirical security study of native code in Python virtual machines[J].Computer Science,2022,49(6A):474-479.
[19]PARK J,LIM I,RYU S.Battles with false positives in staticanalysis of JavaScript web applications in the wild[C]//Procee-dings of the 38th International Conference on Software Engineering Companion(ICSE-C).2016:61-70.
[20]CHEN Z F,MA W W Y,LIN W,et al.A study on the changes of dynamic feature code when fixing bugs,Towards the benefits and costs of Python dynamic features[J].Science China Information Sciences,2018,61(1):012107.
[21]WANG B B,CHEN L,MA W W Y,et al.An empirical study on the impact of Python dynamic features on change-proneness[C]//Proceedings of 27th International Conference on Software Engineering and Knowledge Engineering.2015:134-139.
[22]KHAN F,CHEN B Q,VARRO D,et al.An empirical study of type-related defects in Python projects[J].IEEE Transactions on Software Engineering,2022,48(8):3145-3158.
[23]GONG L,PRADEL M,SRIDHARAN M,et al.DLint,Dynamically checking bad coding practices in JavaScript[C]//Procee-dings of the 2015 International Symposium on Software Testing and Analysis.New York,NY,USA,Association for Computing Machinery,2015:94-105.
[24]RAK-AMNOUYKIT I,MCCREVAN D,MILANOVA A,et al.Python 3 types in the wild,A tale of two type systems[C]//Proceedings of the 16th ACM SIGPLAN International Sympo-sium on Dynamic Languages.New York,NY,USA,Association for Computing Machinery,2020:57-70.
[25]KREJCIE R V,MORGAN D W.Determining sample size for research activities[J].Educational and Psychological Measurement,1970,30(3):607-610.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!