融合聚类算法和缺陷预测的测试用例优先排序方法

doi:10.11896/jsjkx.200400100

计算机科学 ›› 2021, Vol. 48 ›› Issue (5): 99-108.doi: 10.11896/jsjkx.200400100

融合聚类算法和缺陷预测的测试用例优先排序方法

肖蕾^1,2,3, 陈荣赏^1,4, 缪淮扣², 洪煜⁵

1 厦门理工学院计算机与信息工程学院福建厦门361024
2 上海大学计算机工程与科学学院上海200444
3 上海市计算机软件评测重点实验室上海201112
4 福建省软件评测工程技术研究中心福建厦门361000
5 福建壹飞信息科技有限公司福建厦门361000

收稿日期:2020-04-22 修回日期:2020-09-23 出版日期:2021-05-15 发布日期:2021-05-09
通讯作者: 陈荣赏(rschen@xmut.edu.cn)
基金资助:
国家自然科学基金(61572306);福建省软件评测工程技术研究中心资助项目(ST2019002)

Test Case Prioritization Combining Clustering Approach and Fault Prediction

XIAO Lei^1,2,3, CHEN Rong-shang^1,4, MIAO Huai-kou², HONG Yu⁵

1 College of Computer and Information Engineering,Xiamen University of Technology,Xiamen,Fujian 361024,China
2 School of Computer Engineering and Science,Shanghai University,Shanghai 200444,China
3 Shanghai Key Laboratory of Computer Software Testing & Evaluating,Shanghai 201112,China
4 Engineering Research Center for Software Testing and Evaluation of Fujian Province,Xiamen,Fujian 361000,China
5 Fujian Yifei Information Technology Co.,Ltd.,Xiamen,Fujian 361000,China

Received:2020-04-22 Revised:2020-09-23 Online:2021-05-15 Published:2021-05-09
About author:XIAO Lei,born in 1979,associate professor.Her main research interest includes software testing technology.(lxiao@xmut.edu.cn)
CHEN Rong-shang,born in 1982,senior engineer.His main research interest includes software defect prediction.
Supported by:
National Natural Science Foundation of China(61572306) and Open Project of Fujian Software Evaluation Engineering Technology Research Center(ST2019002).

摘要/Abstract

摘要： 持续集成环境下,软件快速更新加快了回归测试执行的频率,但缺陷快速反馈的需求对回归测试又提出了更高要求。测试用例优先排序技术研究测试用例的重要性,通常将缺陷探测能力强的测试用例优先执行,使其提早发现软件缺陷,其可解决持续集成环境下的快速反馈需求。缺陷预测技术可通过被测系统代码特征和历史缺陷来预估信息预测软件在新版本中发现缺陷的可能性,传统基于聚类的测试用例优先排序方法大多未考虑不同类簇数和特征子集对聚类结果的影响。文中将缺陷预测应用到聚类优先排序方法,构建测试用例和代码关联矩阵,对测试用例进行聚类分析,结合缺陷预测结果和最大最小距离策略指导簇间和簇内排序。通过实验验证发现,类簇数和聚类特征子集选择对排序效果有一定影响,当未能获取最佳类簇数和特征子集时,相比单一的聚类优先排序方法,所提方法可更有效地提高回归测试效率。

关键词: 测试用例优先排序, 回归测试, 聚类分析, 缺陷预测, 特征子集, 最佳类簇数

Abstract: The rapid delivery of software leads to the frequent execution of regression testing.The higher efficiency of regression testing is required for the quick fault-feedback in Continuous integration(CI).The goal of test case prioritization(TCP) approach is that the test cases with the higher fault detection rate are preferentially executed.Therefore,TCP approach meets the requirement of quick fault-feedback in CI.The fault prediction approach can predict the failed probability in the new version using the code feature and the historical failure information.The choice of the number of cluster and the feature subset are not considered in the traditional clustering TCP approaches.This paper proposes a test case prioritization method combining clustering approach and fault prediction,which fisrtly identifies the correlation between the test cases and the codes,then divides the test cases into the different clusters,lastly implements the inter-cluster and intra-cluster prioritization on the guidance of the fault prediction and the maximum and minimum distance strategy.The experimental results verify that the efficiency of prioritization is influenced by the choice of the number of cluster and the feature subset.If the best clustering number and the feature subset are not required,the proposed approach is superior to the traditional clustering TCP approaches.

Key words: Cluster analysis, Defect prediction, Feature subset, Regression testing, Test case prioritization, The best cluster number

中图分类号:

TP311

肖蕾, 陈荣赏, 缪淮扣, 洪煜. 融合聚类算法和缺陷预测的测试用例优先排序方法[J]. 计算机科学, 2021, 48(5): 99-108. https://doi.org/10.11896/jsjkx.200400100

XIAO Lei, CHEN Rong-shang, MIAO Huai-kou, HONG Yu. Test Case Prioritization Combining Clustering Approach and Fault Prediction[J]. Computer Science, 2021, 48(5): 99-108. https://doi.org/10.11896/jsjkx.200400100

参考文献

[1]LIMA J A,VERGILIO S R.Test Case Prioritization in Continuous Integration environments:A systematic mapping study[J].Information & Software Technology,2020,121:1062-1068.
[2]LEUNG H K N,WHITE L.Insights into regression testing[C]//Conference on Software Maintenance.1989:60-69.
[3]BEIZER B.Software testing techniques(2.ed.)[M].DBLP,1990.
[4]CHEN X,CHEN J H,JU X L,et al.Survey of Test Case Prioritization Techniques for Regression Testing[J].Journal of Software,2013,24(8):1695-1712.
[5]ROTHERMEL G,UNTCH R H,CHU C,et al.PrioritizingTest Cases For Regression Testing[J].IEEE Transactions on Software Engineering,2001,27(10):929-948.
[6]ROTHERMEL G,UNTCH R H,CHU C,et al.Test Case Prioritization:An Empirical Study[C]//IEEE International Confe-rence on Software Maintenance.IEEE,1999:179-188.
[7]QU B,NIE C H,XU B W.Test Case Prioritization Based onTest Suite Design Information[J].Chinese Journal of Compu-ters,2008,31(3):431-439
[8]YANG G H,BAO Y,LI D H,et al.Test case prioritization based on requirement[J].Computer Engineering & Design,2011,32(8):2724-2728
[9]JEFFREY D,GUPTA N.Test Case Prioritization Using Relevant Slices[C]//International Computer Software & Applications Conference.IEEE Computer Society,2006:411-420.
[10]KOREL B,KOUTSOGIANNAKIS G.Experimental Compari-son of Code-Based and Model-Based Test Prioritization[C]//2009 International Conference on Software Testing,Verification,and Validation Workshops.IEEE,2009:77-84.
[11]KOREL B,TAHAT L H,HARMAN M.Test PrioritizationUsing System Models[C]//IEEE International Conference on Software Maintenance.IEEE,2005:559-568.
[12]HE L L,YANG Y,LI Z,et al.Reward of Reinforcement Lear-ning of Test Optimization for Continuous Integration[J].Journal of Software,2019,30(5):1438-1449.
[13]WANG X L,ZENG H W,LIN W W.Techniques for Regression Testing in Agile DevelopmentEnvironment[J].Chinese Journal of Computers,2019,42(10):2323-2338.
[14]HAGHIGHATKHAH A,MANTYLA M V,OIVO M,et al.Test prioritization in continuous integration environments[J].Journal of Systems and Software,2018,146(12):80-98.
[15]LEON D,PODGURSKI A.A Comparison of Coverage-Basedand Distribution-Based Techniques for Filtering and Prioritizing Test Cases[C]//International Symposium on Software Reliabi-lity Engineering.IEEE,2003:442-453.
[16]YOO S,HARMAN M,TONELLA P,et al.Clustering TestCases to Achieve Effective & Scalable Prioritisation Incorporating Expert Knowledge[C]//Issta Proceedings of Theghteenth International Symposium on Software Testing & Analysis.2009:201-212.
[17]XIANG C,QING G U,SHU L W,et al.Survey of Static Software Defect Prediction[J].Journal of Software,2016,27(1):1-25.
[18]HEARST M A,DUMAIS S T,OSMAN M E,et al.Support vector machines[J].IEEE Intelligent Systems & Thr Applications,1998,13(4):18-28.
[19]CARLSON R,DO H,DENTON A.A clustering approach to improving test case prioritization:An industrial case study[C]//IEEE International Conference on Software Maintenance.IEEE,2011:382-391.
[20]ARAFEEN M J,DO H.Test Case Prioritization Using Requirements-Based Clustering[C]//IEEE Sixth International Confe-rence on Software Testing.IEEE Computer Society,2013:312-321.
[21]CAI J Y,REN Z L,HU Y,et al.Clustering based test case prioritization[J].Computer Engineering and Applications,2016,52(5):11-15.
[22]NOOR T B,HEMMATI H.A similarity-based approach for test case prioritization using historical failure data[C]//2015 IEEE 26th International Symposium on Software Reliability Enginee-ring(ISSRE).IEEE,2015:58-68.
[23]LI Y L,WANG Q.Test Set Optimization in Continuous Integration:A Systematic Literature Review[J].Journal of Software,2018,29(10):129-158.
[24]MCCABE T J.A Complexity Measure(Abstract).[C]//International Conference on Software Engineering.1976:308-320.
[25]HALSTEAD M H.Elements of Software Science(Operating and programming systems series)[M].Elsevier Science Inc.1978.
[26]BAO Y Q.Survey on Software Prediction Based on MachineLearning[D].Zhejiang:Zhejiang University of Technology,2018.
[27]XING F,GUO P,LYU M R.A novel method for early software quality prediction based on support vector machine[C]//16th IEEE International Symposium on Software Reliability Engineering(ISSRE'05).IEEE,2005:213-222.
[28]WANG T,LI W H,LIU Z L,et al.A software DP(defects prediction) model based on SVM(support vector machine)[J].Journal of Northwestern Polytechnical University,2011,29(6):864-870.
[29]WANG Q,WU S J,LI M S,et al.Software Defect Prediction [J].Journal of Software,2008,19(7):1565-1580.
[30]JING X,WU F,DONG X,et al.An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems[J].IEEE Transactions on Software Engineering,2017,43(4):321-339.
[31]JING X,WU F,DONG X,et al.Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning[C]//Joint Meeting.2015:496-507.
[32]MACQUEEN J.Some Methods for Classification and Analysisof MultiVariate Observations[C]//Proc of Berkeley Symposium on Mathematical Statistics & Probability.1965.
[33]LIU H Y.The Research on Feature Selection Algorithms based onInformation Theory[D].Shanghai:Fudan University,2012.
[34]WANG P.Research on Software Defect Prediction Based onFeature Selection[D].Central China Normal University,2013.
[35]DAVIES D L,BOULDIN D W.A Cluster Separation Measure[J].IEEE Trans. Pattern. Anal. Mach. Intell.,1979,PAMI-1(2):224-227.
[36]CHANG C C,CHIH J,et al.LIBSVM:A library for supportvector machines[J].Acm Transactions on Intelligent Systems &Technology,2011,2(3):1-27.
[37]FRANK E,HALL M,HOLMES G,et al.Weka-A MachineLearning Workbench for Data Mining[C]//Data Mining and Knowledge Discovery Handbook.2009:1-10.
[38]Weka Home Page[OL].http://www.cs.waikato.ac.nz/ml/weka/.
[39]ELBAUM S,MALISHEVSKY A G,ROTHERMEL G.Testcase prioritization:a family of empirical studies[J].IEEE Transactions on Software Engineering,2002,28(2):159-182.
[40]WITTEN I H,FRANK E.Data Mining:Practical MachineLearning Tools and Techniques(Third Edition)[M].China Machine Press,2005.
[41]WANG R C.Research on Similarity-based Regression TestCaseReduction and Prioritization[D].Huazhong:Huazhong University of Science and Technology,2015.

相关文章 15

[1]	张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148
[2]	郑小萌, 高猛, 滕俊元. 航天器软件缺陷预测数据集构建方法研究 Research on Construction Method of Defect Prediction Dataset for Spacecraft Software 计算机科学, 2021, 48(6A): 575-580. https://doi.org/10.11896/jsjkx.200900133
[3]	滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法 Noise Tolerable Feature Selection Method for Software Defect Prediction 计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[4]	曹素娥, 杨泽民. 基于聚类分析算法和优化支持向量机的无线网络流量预测 Prediction of Wireless Network Traffic Based on Clustering Analysis and Optimized Support Vector Machine 计算机科学, 2020, 47(8): 319-322. https://doi.org/10.11896/jsjkx.190800075
[5]	高玉潼, 雷为民, 原玥. 复杂环境下基于聚类分析的人脸目标识别 Face Recognition Based on Cluster Analysis in Complex Environment 计算机科学, 2020, 47(7): 111-117. https://doi.org/10.11896/jsjkx.190500004
[6]	王生武,陈红梅. 基于粗糙集和改进鲸鱼优化算法的特征选择方法 Feature Selection Method Based on Rough Sets and Improved Whale Optimization Algorithm 计算机科学, 2020, 47(2): 44-50. https://doi.org/10.11896/jsjkx.181202285
[7]	王瑞杰, 李军怀, 王侃, 王怀军, 商珣超, 徒鹏佳. 基于改进特征子集区分度的行为识别特征选择方法 Feature Selection Method for Behavior Recognition Based on Improved Feature Subset Discrimination 计算机科学, 2020, 47(11A): 204-208. https://doi.org/10.11896/jsjkx.200100030
[8]	黄海燕, 刘晓明, 孙华勇, 杨志才. 聚类分析算法在不确定性决策中的应用 Application of Clustering Analysis Algorithm in Uncertainty Decision Making 计算机科学, 2019, 46(6A): 593-597.
[9]	邱少健, 蔡子仪, 陆璐. 基于卷积神经网络的代价敏感软件缺陷预测模型 Cost-sensitive Convolutional Neural Network Model for Software Defect Prediction 计算机科学, 2019, 46(11): 156-160. https://doi.org/10.11896/jsjkx.191100502C
[10]	胡梦园, 黄鸿云, 丁佐华. 用于软件缺陷预测的集成模型 Ensemble Model for Software Defect Prediction 计算机科学, 2019, 46(11): 176-180. https://doi.org/10.11896/jsjkx.180901685
[11]	薛参观, 燕雪峰. 基于改进深度森林算法的软件缺陷预测 Software Defect Prediction Based on Improved Deep Forest Algorithm 计算机科学, 2018, 45(8): 160-165. https://doi.org/10.11896/j.issn.1002-137X.2018.08.029
[12]	郝俊生,李冰锋,陈曦,高文娟. 基于Android平台的高校网络订餐系统的设计与实现 Design and Implementation of Network Subscription System Based on Android Platform 计算机科学, 2018, 45(6A): 591-594.
[13]	霍敏霞,薛博桓. 基于智能算法的破碎文件拼接复原技术的研究 Research on Splicing Recovery of Broken Files Based on Intelligent Algorithms H 计算机科学, 2018, 45(6A): 174-178.
[14]	成静, 张涛, 王涛, 董占伟. 一种基于图复杂度的移动导航服务回归测试优先方法 Graphic Complexity-based Prioritizing Technique for Regression Testing of Mobile Navigation Service 计算机科学, 2018, 45(6): 141-144. https://doi.org/10.11896/j.issn.1002-137X.2018.06.024
[15]	陈翔, 王秋萍. 基于代码修改的多目标有监督缺陷预测建模方法 Multi-objective Supervised Defect Prediction Modeling Method Based on Code Changes 计算机科学, 2018, 45(6): 161-165. https://doi.org/10.11896/j.issn.1002-137X.2018.06.028

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

融合聚类算法和缺陷预测的测试用例优先排序方法

Test Case Prioritization Combining Clustering Approach and Fault Prediction

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0