基于决策树算法的API误用检测

doi:10.11896/jsjkx.211100177

计算机科学 ›› 2022, Vol. 49 ›› Issue (11): 30-38.doi: 10.11896/jsjkx.211100177

基于决策树算法的API误用检测

李康乐¹, 任志磊^1,2, 周志德¹, 江贺¹

1 大连理工大学软件学院辽宁大连 116600
2 高安全系统的软件开发与验证技术工业和信息化部重点实验室(南京航空航天大学) 南京 211106

收稿日期:2021-11-17 修回日期:2022-06-01 出版日期:2022-11-15 发布日期:2022-11-03
通讯作者: 任志磊(zren@dlut.edu.cn)
作者简介:(kangleli@mail.dlut.edu.cn)
基金资助:
南京航空航天大学科研基地创新(理工类)项目(NJ2020022);国家自然科学基金(62032004,62072068);国家重点研发计划(2018YF-B1003900)

Decision Tree Algorithm-based API Misuse Detection

LI Kang-le¹, REN Zhi-lei^1,2, ZHOU Zhi-de¹, JIANG He¹

1 School of Software Technology,Dalian University of Technology,Dalian,Liaoning 116600,China
2 Key Laboratory of Software Development and Verification Technology of High Security System Ministry of Industry and Information Technology (Nanjing University of Aeronautics and Astronautics),Nanjing 211106,China

Received:2021-11-17 Revised:2022-06-01 Online:2022-11-15 Published:2022-11-03
About author:LI Kang-le,born in 1997,postgra-duate,is a student member of China Computer Federation.His main research interests include intelligent software engineering and data mining.
REN Zhi-lei,born in 1984,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include evolutionary computation,automatic algorithm configuration,and mining software repositories.
Supported by:
Fundamental Research Funds for the Central Universities(NJ2020022),National Natural Science Foundation of China(62032004,62072068) and National Key Research and Development Program of China(2018YF-B1003900).

摘要/Abstract

摘要： 通过应用程序编程接口(Application Programming Interface,API)复用已有的软件框架或类库,可有效地提高软件开发效率。然而,正确使用API须遵守很多规约,如调用顺序、异常处理等。若违反了这些规约就会造成API误用,进而可能导致软件崩溃、产生错误或漏洞。尽管很多API误用检测技术已经被提出,但是这些技术仍面临两个方面的挑战:1)难以获取API使用规约;2)难以同时检测多种不同类型的API误用。为了应对上述挑战,提出了一种基于决策树算法的API误用检测方法。首先,将API使用源代码转换为API使用图,从图中挖掘API使用规约,有效地应对了第一个挑战。其次,在获取的API规约信息的基础上构建API使用决策树,并通过融入剪枝策略来提高API使用决策树的泛化能力。最后,在检测阶段提出了粗粒度和细粒度相结合的检测方式,来提高API使用决策树的检测能力,有效地应对了第二个挑战。实验结果表明,该方法能够在一定程度上发现API误用缺陷。

关键词: API误用, 决策树, 规约挖掘, 缺陷检测

Abstract: Application programming interface(API) benefits to effectively improve software development efficiency by reusing existing software frameworks or libraries.However,many constraints must be satisfied to correctly use APIs,such as call order,exception handling.Violation of these constraints will cause API misuse,which may result in software crashes,errors,or vulnerabilities.Although many API misuse detection techniques have been proposed,these techniques still face two challenges:1) the acquisition of API usage specification is difficult,and 2) the detection of many different types of API misuse at the same time is difficult.To address the above challenges,a decision tree algorithm-based API misuse detection method is proposed.First,the API usage source code is converted into an API usage graph,and the API usage specification is mined from the graph to effectively solve the first challenge.Second,an API usage decision tree is constructed based on the obtained API specification information,and the generalization ability of the API usage decision tree is improved by incorporating pruning strategies.Finally,a combination of coarse-grained and fine-grained detection is proposed in the detection phase to improve the detection capability of the API usage decision tree,which effectively solves the second challenge.Experimental results show that the proposed approach can rea-lize detection of API misuse defects to a certain extent.

Key words: API Misuse, Decision tree, Specification mining, Bug detection

中图分类号:

TP311

李康乐, 任志磊, 周志德, 江贺. 基于决策树算法的API误用检测[J]. 计算机科学, 2022, 49(11): 30-38. https://doi.org/10.11896/jsjkx.211100177

LI Kang-le, REN Zhi-lei, ZHOU Zhi-de, JIANG He. Decision Tree Algorithm-based API Misuse Detection[J]. Computer Science, 2022, 49(11): 30-38. https://doi.org/10.11896/jsjkx.211100177

参考文献

[1]LEGUNSEN O,HASSAN W U,XU X,et al.How good are the specs? a study of the bug-finding effectiveness of existing java api specifications[C]//2016 31st IEEE/ACM International Conference on Automated Software Engineering(ASE).2016:602-613.
[2]ZHANG T,UPADHYAYA G,REINHARDT A,et al.Are code examples on an online q&a forum reliable?:a study of api misuse on stack overflow[C]//2018 IEEE/ACM 40th International Conference on Software Engineering(ICSE).2018:886-896.
[3]GU Z,WU J,LI C,et al.Vetting api usages in c programs with imchecker[C]//2019 IEEE/ACM 41st International Conference on Software Engineering:Companion Proceedings(ICSE-Companion).2019:91-94.
[4]GITHUB.Semmle-a code analysis platform for finding zero-days and automating variant analysis[Z].2017.
[5]YUN I,MIN C,SI X,et al.Apisan:Sanitizing API usagesthrough semantic cross-checking[C]//25th USENIX Security Symposium(USENIX Security 16).Austin,TX:USENIX Association,2016:363-378.
[6]WASYLKOWSKI A,ZELLER A,LINDIG C.Detecting object usage anomalies[C]//Proceedings of the 6thJoint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.2007:35-44.
[7]GU Z,WU J,LIU J,et al. An empirical study onapi-misuse bugs in open-source c programs[C]//2019 IEEE 43rd Annual Computer Software and Applications Conference(COMPSAC).2019:11-20.
[8]AMANN S,NADI S,NGUYEN H A,et al.Mubench:A benchmark for api-misuse detectors[C]//2016 IEEE/ACM 13th Working Conference on Mining Software Repositories(MSR).2016:464-467.
[9]LI Z,MACHIRY A,CHEN B,et al.Arbitrar:User-guided api misuse detection[C]//2021 IEEE Symposium on Security and Privacy(SP).2021:1400-1415.
[10]MONPERRUS M,MEZINI M.Detecting missing method calls as violations of the majorityrule[J].ACM Transactions on Software Engineering and Methodology,2013,22(1):1-25.
[11]NGUYEN T T,VU P M,NGUYEN T T.Api misuse correction:A fuzzy logic approach[C]//Proceedings of the 2020 ACM Southeast Conference.2020:288-291.
[12]AMANN S,NGUYEN H A,NADI S,et al.A systematic eva-luation of staticapi-misuse detectors[J].IEEE Transactions on Software Engineering,2019,45(12):1170-1188.
[13]DEKEL U,HERBSLEB J D.Improvingapi documentation usability with knowledge pushing[C]//2009 IEEE 31st International Conference on Software Engineering.2009:320-330.
[14]WEN M,CHEN J,WU R,et al.Context-aware patch generation for better automated program repair[C]//2018 IEEE/ACM 40th International Conference on Software Engineering(ICSE).2018:1-11.
[15]GEORGIEV M,IYENGAR S,JANA S,et al.The most dange-rous code in the world:validatingssl certificates in non-browser software [C]//Proceedings of the 2012 ACM Conference on Computer and Communications Security.2012:38-49.
[16]SVEN A,NGUYEN H A,NADI S,et al.Investigating nextsteps in staticapi-misuse detection[C]//2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR).2019:265-275.
[17]JIN C,LUO D L,MU F X.An improved id3 decision tree algorithm[C]//2009 4th International Conference on Computer Science Education.2009:127-130.
[18]HSSINA B,MERBOUHA A,EZZIKOURI H,et al.A comparative study of decision tree id3 and c4.5[J].International Journal of Advanced Computer Science and Applications,2014,4(2):13-19.
[19]ESPOSITO F,MALERBA D,SEMERARO G,et al.A comparative analysis of methods for pruning decision trees[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(5):476-491.
[20]RASTOGI R,SHIM K.Public:A decision tree classifier that integrates building and pruning[J].Data Mining and Knowledge Discovery,2000,4(4):315-344.
[21]NIELEBOCK S,HEUMÜLLER R,KRÜGER J,et al.Cooperative api misuse detection using correction rules[C]//2020 IEEE/ACM 42nd International Conference on Software Engineering:New Ideas and Emerging Results(ICSE-NIER).2020:73-76.
[22]WICKERT A K,REIF M,EICHBERG M,et al.A dataset of parametric cryptographic misuses[C]//2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR).2019:96-100.
[23]LAMOTHE M,LI H,SHANG W.Assisting example-basedapi misuse detection via complementary artificial examples[J].IEEE Transactions on Software Engineering,2021.
[24]WANG X,CHEN C,ZHAO Y F,et al.API Misuse Bug Detection Based on Deep Learning[J].Ruan Jian Xue Bao/Journal of Software,2019,30(5):1342-1358.
[25]AVGUSTINOV P,DE MOOR O,JONES M P,et al.QL:Object-oriented queries on relational ！data[C]//30th European Conference on Object-Oriented Programming(ECOOP 2016).2016.
[26]CHEN H,WAGNER D.Mops:an infrastructure for examining security properties of software[C]//Proceedings of the 9th ACM Conference on Computer and Communications Security.2002:235-244.
[27]LI Z,ZHOU Y.Pr-miner:automatically extracting implicit programming rules and detecting violations in large software code[J].ACM SIGSOFT Software Engineering Notes,2005,30(5):306-315.
[28]WASYLKOWSKI A,ZELLER A.Mining temporal specifica-tions from object usage[C]//2009 IEEE/ACM International Conference on Automated Software Engineering.2009:295-306.
[29]ZHONG H,ZHANG L,XIE T,et al.Inferring resource specifications from natural languageapi documentation[C]//2009 IEEE/ACM International Conference on Automated Software Engineering.2009:307-318.
[30]ZENG J,BEN K,ZHANG X,et al.API misuse bug detection based on sequence pattern matching[J].Huazhong Keji Daxue Xuebao(Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology(Natural Science Edition),2021,49(2):108-114,132.
[31]WEN M,LIU Y,WU R,et al.Exposing library api misuses via mutation analysis[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).2019:866-877.

相关文章 15

[1]	李发光, 伊力哈木·亚尔买买提. 基于改进CenterNet的航拍绝缘子缺陷实时检测模型 Real-time Detection Model of Insulator Defect Based on Improved CenterNet 计算机科学, 2022, 49(5): 84-91. https://doi.org/10.11896/jsjkx.210400142
[2]	任首朋, 李劲, 王静茹, 岳昆. 基于集成回归决策树的lncRNA-疾病关联预测方法 Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction 计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[3]	刘振宇, 宋晓莹. 一种可用于分类型属性数据的多变量回归森林 Multivariate Regression Forest for Categorical Attribute Data 计算机科学, 2022, 49(1): 108-114. https://doi.org/10.11896/jsjkx.201200189
[4]	曹扬晨, 朱国胜, 祁小云, 邹洁. 基于随机森林的入侵检测分类研究 Research on Intrusion Detection Classification Based on Random Forest 计算机科学, 2021, 48(6A): 459-463. https://doi.org/10.11896/jsjkx.200600161
[5]	唐亮, 李飞. 基于决策树的车联网安全态势预测模型研究 Research on Forecasting Model of Internet of Vehicles Security Situation Based on Decision Tree 计算机科学, 2021, 48(6A): 514-517. https://doi.org/10.11896/jsjkx.200700158
[6]	丁思凡, 王锋, 魏巍. 一种基于标签相关度的Relief特征选择算法 Relief Feature Selection Algorithm Based on Label Correlation 计算机科学, 2021, 48(4): 91-96. https://doi.org/10.11896/jsjkx.200800025
[7]	彭磊, 张辉. 基于U-net的道路缺陷检测 U-net for Pavement Crack Detection 计算机科学, 2021, 48(11A): 616-619. https://doi.org/10.11896/jsjkx.201200059
[8]	刘鑫, 黄沁元, 李强, 冉茂霞, 周颖, 杨天. 基于卷积神经网络和声振图像的磁瓦内部缺陷检测 Fault Detection for Arc Magnet Based on Convolutional Neural Network and Acoustic VibrationImage 计算机科学, 2021, 48(11A): 648-654. https://doi.org/10.11896/jsjkx.210100161
[9]	董明刚, 黄宇扬, 敬超. 基于遗传实例和特征选择的K近邻训练集优化方法 K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection 计算机科学, 2020, 47(8): 178-184. https://doi.org/10.11896/jsjkx.190700089
[10]	谢源, 苗玉彬, 许凤麟, 张铭. 基于半监督深度卷积生成对抗网络的注塑瓶表面缺陷检测模型 Injection-molded Bottle Defect Detection Using Semi-supervised Deep Convolutional Generative Adversarial Network 计算机科学, 2020, 47(7): 92-96. https://doi.org/10.11896/jsjkx.190700093
[11]	杨志伟, 戴铭, 周智恒. 基于直方图差异的工业产品表面缺陷检测方法 Surface Defect Detection Method of Industrial Products Based on Histogram Difference 计算机科学, 2020, 47(6A): 247-249. https://doi.org/10.11896/JsJkx.191000049
[12]	邹洁, 朱国胜, 祁小云, 曹扬晨. 基于C4.5决策树的HTTPS加密流量分类方法 HTTPS Encrypted Traffic Classification Method Based on C4.5 Decision Tree 计算机科学, 2020, 47(6A): 381-385. https://doi.org/10.11896/JsJkx.191200155
[13]	朱涤尘, 夏换, 杨秀璋, 于小民, 张亚成, 武帅. 基于文本挖掘和决策树分析的中国手游产业发展研究 Research on Mobile Game Industry Development in China Based on Text Mining and Decision Tree Analysis 计算机科学, 2020, 47(6A): 530-534. https://doi.org/10.11896/JsJkx.190700124
[14]	罗月,童卞,景帅,张蒙,饶永明,闫峰. 基于卷积去噪自编码器的芯片表面弱缺陷检测方法 Detection Method of Chip Surface Weak Defect Based on Convolution Denoising Auto-encoders 计算机科学, 2020, 47(2): 118-125. https://doi.org/10.11896/jsjkx.190100141
[15]	董本清, 李凤坤. 基于加权划分非平衡决策树的诗歌朗读情感度分析 Analysis of Emotional Degree of Poetry Reading Based on WDOUDT 计算机科学, 2020, 47(11A): 46-51. https://doi.org/10.11896/jsjkx.200600055

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

基于决策树算法的API误用检测

Decision Tree Algorithm-based API Misuse Detection

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0