Computer Science ›› 2022, Vol. 49 ›› Issue (11): 39-48.doi: 10.11896/jsjkx.220200086

• Computer Software • Previous Articles     Next Articles

AutoUnit:Automatic Test Generation Based on Active Learning and Prediction Guidance

ZHANG Da-lin1, ZHANG Zhe-wei2, WANG Nan1, LIU Ji-qiang1   

  1. 1 School of Software Engineering,Beijing Jiaotong University,Beijing 100044,China
    2 School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
  • Received:2022-02-16 Revised:2022-05-19 Online:2022-11-15 Published:2022-11-03
  • About author:ZHANG Da-lin,born in 1983,Ph.D,associate professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include software engineering theory and technology.
    ZHANG Zhe-wei,born in 1999,posgra-duate.His main research interests include machine learning and software automated testing.
  • Supported by:
    Fundamental Research Funds for the Central Universities of Ministry of Education of China(2021QY010).

Abstract: Automated test case generation technology aims to reduce test costs.Compared with manual test generation,it has higher test efficiency.Most existing testing tools treat all files in the software equally,but in fact,files with defects account for only a small part of the whole code.Therefore,if testers can detect files that are more prone to defects,they can greatly save testing resources.To solve the above problems,this paper designs a predictive guidance test tool AutoUnit,which is based on active learning.We first predict the defect files in the whole file pool to be detected.Next,we use the detection tool to detect the most “suspicious” files.Then we feed back the actual detection results to the prediction model and update the model to enter the next round of prediction.In addition,when the total number of defective files is unknown,AutoUnit can stop in time by setting diffe-rent target recall rates.It can predict the total number of defective files according to the tested files,calculate the current recall rate,judge whether to stop predict guidance and ensure testing efficiency.Experimental analysis shows that when the same number of defect files are tested,the shortest time and the longest time taken by AutoUnit is 70.9% and 80.7% of the current mainstream testing tools,respectively.When the total number of defective files is unknown and the target recall rate is set to 95%,compared with the latest version of Evosuite,AutoUnit only needs to check 29.7% of the source code files to achieve the same detection level,and its test time is only 34.6% of Evosuite,the cost of testing is greatly reduced.Experimental results show that the method effectively improves the efficiency of test.

Key words: Test case generation, Pool-based active learning, Defect predict model, Random test, Detection efficiency

CLC Number: 

  • TP391
[1]SHIN Y,WILLIAMS L.Can traditional fault prediction models be used for vulnerability prediction?[J].Empirical Software Engineering,2013,18(1):25-59.
[2]MAGGIO M,HOFFMANN H,SANTAMBROGIO M D,et al.Controlling software applications via resource allocation within the heartbeats framework[C]//49th IEEE Conference on Decision and Control(CDC).IEEE,2010:3736-3741.
[3]WANG S,LIU T,NAM J,et al.Deep semantic feature learning for software defect prediction[J].IEEE Transactions on Software Engineering,2018,46(12):1267-1293.
[4]IQBAL A,AFTAB S,ALI U,et al.Performance analysis of machine learning techniques on software defect prediction using NASA datasets[J].International Journal of Advanced Computer Science and Applications,2019,10(5):300-308.
[5]THOTA M K,SHAJIN F H,RAJESH P.Survey on software defect prediction techniques[J].International Journal of Applied Science and Engineering,2020,17(4):331-344.
[6]YU G,CHEN X,DOMENICONI C,et al.Cmal:Cost-effective multi-label active learning by querying subexamples[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(5):2091-2105.
[7]PACHECO C,LAHIRI S K,ERNST M D,et al.Feedback-directed random test generation[C]//29th International Confe-rence on Software Engineering(ICSE’07).IEEE,2007:75-84.
[8]MY H L T,THANH B N,THANH T K.Survey on mutation-based test data generation[J].International Journal of Electrical and Computer Engineering,2015,5(5):1164-1173.
[9]MCMINN P.Search-based software test data generation:a survey[J].Software Testing,Verification and Reliability,2004,14(2):105-156.
[10]BALDONI R,COPPA E,D’ELIA D C,et al.A survey of symbolic execution techniques[J].ACM Computing Surveys (CSUR),2018,51(3):1-39.
[11]GODEFROID P,LEVIN M Y,MOLNAR D A.Automatedwhitebox fuzz testing[C]//NDSS.2008,8:151-166.
[12]HALLER I,SLOWINSKA A,NEUGSCHWANDTNER M,et al.Dowsing for Overfıows:A Guided Fuzzer to Find Buffer Boundary Violations[C]//22nd {USENIX} Security Sympo-sium({USENIX} Security 13).2013:49-64.
[13]GODEFROID P,KLARLUND N,SEN K.DART:Directed automated random testing[C]//Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation.2005:213-223.
[14]PANDITA R,XIE T,TILLMANN N,et al.Guided test generation for coverage criteria[C]//2010 IEEE International Confe-rence on Software Maintenance.IEEE,2010:1-10.
[15]SHIN Y,MENEELY A,WILLIAMS L,et al.Evaluating complexity,code churn,and developer activity metrics as indicators of software vulnerabilities[J].IEEE Transactions on Software Engineering,2010,37(6):772-787.
[16]LI Y,JI S,LV C,et al.V-fuzz:Vulnerability-oriented evolutio-nary fuzzing[J].arXiv:1901.01142,2019.
[17]PERERA A,ALETI A,BÖHME M,et al.Defect predictionguided search-based software testing[C]//2020 35th IEEE/ACM International Conference on Automated Software Engineering(ASE).IEEE,2020:448-460.
[18]SETTLES B.Active learning literature survey[D].Madison:University of Wisconsin-Madison,2019.
[19]VIJAYANARASIMHAN S,GRAUMAN K.Large-scale liveactive learning:Training object detectors with crawled data and crowds[J].International Journal of Computer Vision,2014,108(1):97-114.
[20]CASSEL S,HOWAR F,JONSSON B,et al.Active learning for extended finite state machines[J].Formal Aspects of Computing,2016,28(2):233-263.
[21]CHU W,ZINKEVICH M,LI L,et al.Unbiased online active learning in data streams[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2011:195-203.
[22]MOHAMAD S,SAYED-MOUCHAWEH M,BOUCHACHIA A.Active learning for classifying data streams with unknown number of classes[J].Neural Networks,2018,98:1-15.
[23]LIU W,ZHANG H,DING Z,et al.A comprehensive activelearning method for multiclass imbalanced data streams with concept drift[J/OL].Knowledge-Based Systems,2021,215.https://www.sciencedirect.com/science/article/pii/S0950705121000411.
[24]SINHA S,EBRAHIMI S,DARRELL T.Variational adversarial active learning[C]//Proceedings of the IEEE/CVF InternationalConference on Computer Vision.2019:5972-5981.
[25]WU D.Pool-based sequential active learning for regression[J].IEEE Transactions on Neural Networks and Learning Systems,2018,30(5):1348-1359.
[26]BELUCH W H,GENEWEIN T,NüRNBERGER A,et al.The power of ensembles for active learning in image classification[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2018:9368-9377.
[27]WAHONO R S.A systematic literature review of software defect prediction[J].Journal of Software Engineering,2015,1(1):1-16.
[28]D’AMBROS M,LANZA M,ROBBES R.On the relationshipbetween change coupling and software defects[C]//2009 16th Working Conference on Reverse Engineering.IEEE,2009:135-144.
[29]MIZUNO O,IKAMI S,NAKAICHI S,et al.Spam filter based approach for finding fault-prone software modules[C]//Fourth International Workshop on Mining Software Repositories(MSR’07:ICSE Workshops 2007).IEEE,2007.
[30]GROSSMAN M R,CORMACK G V,ROEGIEST A.TREC2016 Total Recall Track Overview[C]//TREC.2016.
[31]YU Z,THEISEN C,WILLIAMS L,et al.Improving vulnera-bility inspection efficiency using active learning[J].IEEE Tran-sactions on Software Engineering,2019,47(11),2401-2420.
[32]RAMOS J.Using tf-idf to determine word relevance in document queries[C]//Proceedings of the First Instructional Conference on Machine Learning.2003:29-48.
[33]LI H,CHUNG F,WANG S.A SVM based classification method for homogeneous data[J].Applied Soft Computing,2015,36:228-235.
[34]WEI H,HU C,CHEN S,et al.Establishing a software defectprediction model via effective dimension reduction[J].Information Sciences,2019,477:399-409.
[35]WANG K,LIU L,YUAN C,et al.Software defect predictionmodel based on LASSO-SVM[J].Neural Computing and Applications,2021,33(14):8249-8259.
[36]JUST R,JALALI D,ERNST M D.Defects4J:A database ofexisting faults to enable controlled testing studies for Java programs[C]//Proceedings of the 2014 International Symposium on Software Testing and Analysis.2014:437-440.
[37]PACHECO C,ERNST M D.Randoop:feedback-directed ran-dom testing for Java[C]//Companion to the 22nd ACM SIGPLAN Conference on Object-oriented Programming Systems and Applications Companion.2007:815-816.
[38]FRASER G,ARCURI A.Evosuite:automatic test suite generation for object-oriented software[C]//Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering.2011:416-419.
[39]VIRGÍNIO T,MARTINS L A,SOARES L R,et al.An empirical study of automatically-generated tests from the perspective of test smells[C]//Proceedings of the 34th Brazilian Symposium on Software Engineering.2020:92-96.
[1] WANG Wen-xuan, HU Jun, HU Jian-cheng, KANG Jie-xiang, WANG Hui, GAO Zhong-jie. Test Case Generation Method Oriented to Tabular Form Formal Requirement Model [J]. Computer Science, 2021, 48(5): 16-24.
[2] JI Shun-hui, ZHANG Peng-cheng. Test Case Generation Approach for Data Flow Based on Dominance Relations [J]. Computer Science, 2020, 47(9): 40-46.
[3] ZHANG Na,TENG Sai-na,WU Biao,BAO Xiao-an. Test Case Generation Method Based on Particle Swarm Optimization Algorithm [J]. Computer Science, 2019, 46(7): 146-150.
[4] LI Zhi-bo, LI Qing-bao, YU Lei, HOU Xue-mei. Survey on Adaptive Random Testing by Partitioning [J]. Computer Science, 2019, 46(3): 19-29.
[5] YANG Hong, HONG Mei, QU Yuan-yuan. Approach of Mutation Test Case Generation Based on Model Checking [J]. Computer Science, 2018, 45(11A): 488-493.
[6] HUANG Yu-yao, LI Feng-ying, CHANG Liang and MENG Yu. Symbolic ZBDD-based Generation Algorithm for Combinatorial Testing [J]. Computer Science, 2018, 45(1): 255-260.
[7] CHEN Jie-qiong, JIANG Shu-juan and ZHANG Zheng-guang. Approach for Test Case Generation Based on Data Flow Criterion [J]. Computer Science, 2017, 44(2): 107-111.
[8] ZHANG Xiong and LI Zhou-jun. Survey of Fuzz Testing Technology [J]. Computer Science, 2016, 43(5): 1-8.
[9] TAN Xin, PENG Yao-peng, YANG Shuai and ZHENG Wei. Automated Test Case Generation Based on SPEA2+SDE [J]. Computer Science, 2015, 42(Z11): 450-453.
[10] WU Sheng-feng, WU Yue and XU Shi-yi. Study on Quasi-perfect Maximum Distance Pseudo Random Testing [J]. Computer Science, 2014, 41(5): 50-54.
[11] WANG Zhen-zhen. Elementary Theoretical Framework for Software Testing [J]. Computer Science, 2014, 41(3): 12-16.
[12] HOU Chao-fan,WU Ji and LIU Chao. Interoperability Test Case Generation Based on Testing Requirements [J]. Computer Science, 2014, 41(11): 162-168.
[13] WANG Zhen-zhen. Average Scale Stochastic TBFL Approach [J]. Computer Science, 2014, 41(1): 235-241.
[14] . Survey on Fault-based Testing Techniques for Boolean Expressions [J]. Computer Science, 2013, 40(3): 16-23.
[15] SUN Wen-jing and QIAN Hua. Improved BM Algorithm and Its Application in Network Intrusion Detection [J]. Computer Science, 2013, 40(12): 174-176.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!