Computer Science ›› 2022, Vol. 49 ›› Issue (11): 30-38.doi: 10.11896/jsjkx.211100177

• Computer Software • Previous Articles     Next Articles

Decision Tree Algorithm-based API Misuse Detection

LI Kang-le1, REN Zhi-lei1,2, ZHOU Zhi-de1, JIANG He1   

  1. 1 School of Software Technology,Dalian University of Technology,Dalian,Liaoning 116600,China
    2 Key Laboratory of Software Development and Verification Technology of High Security System Ministry of Industry and Information Technology (Nanjing University of Aeronautics and Astronautics),Nanjing 211106,China
  • Received:2021-11-17 Revised:2022-06-01 Online:2022-11-15 Published:2022-11-03
  • About author:LI Kang-le,born in 1997,postgra-duate,is a student member of China Computer Federation.His main research interests include intelligent software engineering and data mining.
    REN Zhi-lei,born in 1984,Ph.D,asso-ciate professor,is a member of China Computer Federation.His main research interests include evolutionary computation,automatic algorithm configuration,and mining software repositories.
  • Supported by:
    Fundamental Research Funds for the Central Universities(NJ2020022),National Natural Science Foundation of China(62032004,62072068) and National Key Research and Development Program of China(2018YF-B1003900).

Abstract: Application programming interface(API) benefits to effectively improve software development efficiency by reusing existing software frameworks or libraries.However,many constraints must be satisfied to correctly use APIs,such as call order,exception handling.Violation of these constraints will cause API misuse,which may result in software crashes,errors,or vulnerabilities.Although many API misuse detection techniques have been proposed,these techniques still face two challenges:1) the acquisition of API usage specification is difficult,and 2) the detection of many different types of API misuse at the same time is difficult.To address the above challenges,a decision tree algorithm-based API misuse detection method is proposed.First,the API usage source code is converted into an API usage graph,and the API usage specification is mined from the graph to effectively solve the first challenge.Second,an API usage decision tree is constructed based on the obtained API specification information,and the generalization ability of the API usage decision tree is improved by incorporating pruning strategies.Finally,a combination of coarse-grained and fine-grained detection is proposed in the detection phase to improve the detection capability of the API usage decision tree,which effectively solves the second challenge.Experimental results show that the proposed approach can rea-lize detection of API misuse defects to a certain extent.

Key words: API Misuse, Decision tree, Specification mining, Bug detection

CLC Number: 

  • TP311
[1]LEGUNSEN O,HASSAN W U,XU X,et al.How good are the specs? a study of the bug-finding effectiveness of existing java api specifications[C]//2016 31st IEEE/ACM International Conference on Automated Software Engineering(ASE).2016:602-613.
[2]ZHANG T,UPADHYAYA G,REINHARDT A,et al.Are code examples on an online q&a forum reliable?:a study of api misuse on stack overflow[C]//2018 IEEE/ACM 40th International Conference on Software Engineering(ICSE).2018:886-896.
[3]GU Z,WU J,LI C,et al.Vetting api usages in c programs with imchecker[C]//2019 IEEE/ACM 41st International Conference on Software Engineering:Companion Proceedings(ICSE-Companion).2019:91-94.
[4]GITHUB.Semmle-a code analysis platform for finding zero-days and automating variant analysis[Z].2017.
[5]YUN I,MIN C,SI X,et al.Apisan:Sanitizing API usagesthrough semantic cross-checking[C]//25th USENIX Security Symposium(USENIX Security 16).Austin,TX:USENIX Association,2016:363-378.
[6]WASYLKOWSKI A,ZELLER A,LINDIG C.Detecting object usage anomalies[C]//Proceedings of the 6thJoint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.2007:35-44.
[7]GU Z,WU J,LIU J,et al. An empirical study onapi-misuse bugs in open-source c programs[C]//2019 IEEE 43rd Annual Computer Software and Applications Conference(COMPSAC).2019:11-20.
[8]AMANN S,NADI S,NGUYEN H A,et al.Mubench:A benchmark for api-misuse detectors[C]//2016 IEEE/ACM 13th Working Conference on Mining Software Repositories(MSR).2016:464-467.
[9]LI Z,MACHIRY A,CHEN B,et al.Arbitrar:User-guided api misuse detection[C]//2021 IEEE Symposium on Security and Privacy(SP).2021:1400-1415.
[10]MONPERRUS M,MEZINI M.Detecting missing method calls as violations of the majorityrule[J].ACM Transactions on Software Engineering and Methodology,2013,22(1):1-25.
[11]NGUYEN T T,VU P M,NGUYEN T T.Api misuse correction:A fuzzy logic approach[C]//Proceedings of the 2020 ACM Southeast Conference.2020:288-291.
[12]AMANN S,NGUYEN H A,NADI S,et al.A systematic eva-luation of staticapi-misuse detectors[J].IEEE Transactions on Software Engineering,2019,45(12):1170-1188.
[13]DEKEL U,HERBSLEB J D.Improvingapi documentation usability with knowledge pushing[C]//2009 IEEE 31st International Conference on Software Engineering.2009:320-330.
[14]WEN M,CHEN J,WU R,et al.Context-aware patch generation for better automated program repair[C]//2018 IEEE/ACM 40th International Conference on Software Engineering(ICSE).2018:1-11.
[15]GEORGIEV M,IYENGAR S,JANA S,et al.The most dange-rous code in the world:validatingssl certificates in non-browser software [C]//Proceedings of the 2012 ACM Conference on Computer and Communications Security.2012:38-49.
[16]SVEN A,NGUYEN H A,NADI S,et al.Investigating nextsteps in staticapi-misuse detection[C]//2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR).2019:265-275.
[17]JIN C,LUO D L,MU F X.An improved id3 decision tree algorithm[C]//2009 4th International Conference on Computer Science Education.2009:127-130.
[18]HSSINA B,MERBOUHA A,EZZIKOURI H,et al.A comparative study of decision tree id3 and c4.5[J].International Journal of Advanced Computer Science and Applications,2014,4(2):13-19.
[19]ESPOSITO F,MALERBA D,SEMERARO G,et al.A comparative analysis of methods for pruning decision trees[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(5):476-491.
[20]RASTOGI R,SHIM K.Public:A decision tree classifier that integrates building and pruning[J].Data Mining and Knowledge Discovery,2000,4(4):315-344.
[21]NIELEBOCK S,HEUMÜLLER R,KRÜGER J,et al.Cooperative api misuse detection using correction rules[C]//2020 IEEE/ACM 42nd International Conference on Software Engineering:New Ideas and Emerging Results(ICSE-NIER).2020:73-76.
[22]WICKERT A K,REIF M,EICHBERG M,et al.A dataset of parametric cryptographic misuses[C]//2019 IEEE/ACM 16th International Conference on Mining Software Repositories(MSR).2019:96-100.
[23]LAMOTHE M,LI H,SHANG W.Assisting example-basedapi misuse detection via complementary artificial examples[J].IEEE Transactions on Software Engineering,2021.
[24]WANG X,CHEN C,ZHAO Y F,et al.API Misuse Bug Detection Based on Deep Learning[J].Ruan Jian Xue Bao/Journal of Software,2019,30(5):1342-1358.
[25]AVGUSTINOV P,DE MOOR O,JONES M P,et al.QL:Object-oriented queries on relational !data[C]//30th European Conference on Object-Oriented Programming(ECOOP 2016).2016.
[26]CHEN H,WAGNER D.Mops:an infrastructure for examining security properties of software[C]//Proceedings of the 9th ACM Conference on Computer and Communications Security.2002:235-244.
[27]LI Z,ZHOU Y.Pr-miner:automatically extracting implicit programming rules and detecting violations in large software code[J].ACM SIGSOFT Software Engineering Notes,2005,30(5):306-315.
[28]WASYLKOWSKI A,ZELLER A.Mining temporal specifica-tions from object usage[C]//2009 IEEE/ACM International Conference on Automated Software Engineering.2009:295-306.
[29]ZHONG H,ZHANG L,XIE T,et al.Inferring resource specifications from natural languageapi documentation[C]//2009 IEEE/ACM International Conference on Automated Software Engineering.2009:307-318.
[30]ZENG J,BEN K,ZHANG X,et al.API misuse bug detection based on sequence pattern matching[J].Huazhong Keji Daxue Xuebao(Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology(Natural Science Edition),2021,49(2):108-114,132.
[31]WEN M,LIU Y,WU R,et al.Exposing library api misuses via mutation analysis[C]//2019 IEEE/ACM 41st International Conference on Software Engineering(ICSE).2019:866-877.
[1] REN Shou-peng, LI Jin, WANG Jing-ru, YUE Kun. Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction [J]. Computer Science, 2022, 49(2): 265-271.
[2] LIU Zhen-yu, SONG Xiao-ying. Multivariate Regression Forest for Categorical Attribute Data [J]. Computer Science, 2022, 49(1): 108-114.
[3] CAO Yang-chen, ZHU Guo-sheng, QI Xiao-yun, ZOU Jie. Research on Intrusion Detection Classification Based on Random Forest [J]. Computer Science, 2021, 48(6A): 459-463.
[4] TANG Liang, LI Fei. Research on Forecasting Model of Internet of Vehicles Security Situation Based on Decision Tree [J]. Computer Science, 2021, 48(6A): 514-517.
[5] DING Si-fan, WANG Feng, WEI Wei. Relief Feature Selection Algorithm Based on Label Correlation [J]. Computer Science, 2021, 48(4): 91-96.
[6] DONG Ming-gang, HUANG Yu-yang, JING Chao. K-Nearest Neighbor Classification Training Set Optimization Method Based on Genetic Instance and Feature Selection [J]. Computer Science, 2020, 47(8): 178-184.
[7] ZOU Jie, ZHU Guo-sheng, QI Xiao-yun and CAO Yang-chen. HTTPS Encrypted Traffic Classification Method Based on C4.5 Decision Tree [J]. Computer Science, 2020, 47(6A): 381-385.
[8] ZHU Di-chen, XIA Huan, YANG Xiu-zhang, YU Xiao-min, ZHANG Ya-cheng and WU Shuai. Research on Mobile Game Industry Development in China Based on Text Mining and Decision Tree Analysis [J]. Computer Science, 2020, 47(6A): 530-534.
[9] DONG Ben-qing, LI Feng-kun. Analysis of Emotional Degree of Poetry Reading Based on WDOUDT [J]. Computer Science, 2020, 47(11A): 46-51.
[10] BO Li-li, JIANG Shu-juan, ZHANG Yan-mei, WANG Xing-ya, YU Qiao. Research Progress on Techniques for Concurrency Bug Detection [J]. Computer Science, 2019, 46(5): 13-20.
[11] LV Ming-qi, LI Yi-fan, CHEN Tie-ming. Spatial Estimation Method of Air Quality Based on Terrain Factors LV Ming-qi LI Yi-fan CHEN Tie-ming [J]. Computer Science, 2019, 46(1): 265-270.
[12] XU Zhao-zhao, LI Ching-hwa, CHEN Tong-lin, LEE Shin-jye. Naive Bayesian Decision TreeAlgorithm Combining SMOTE and Filter-Wrapper and It’s Application [J]. Computer Science, 2018, 45(9): 65-69.
[13] SHI Zhi-kai,ZHU Guo-sheng,LEI Long-fei,CHEN Sheng,ZHEN Jia,WU Shan-chao,WU Meng-yu. NAT Device Detection Method Based on C5.0 Decision Tree [J]. Computer Science, 2018, 45(6A): 323-327.
[14] DAI Ming-zhu,GAO Song-feng. Research on Data Mining Algorithm Based on Examination Process and Knowledge Structure [J]. Computer Science, 2018, 45(6A): 437-441.
[15] LI Bai-shen, LI Ling-zhi, SUN Yong and ZHU Yan-qin. Intranet Defense Algorithm Based on Pseudo Boosting Decision Tree [J]. Computer Science, 2018, 45(4): 157-162.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!