计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 75-84.doi: 10.11896/jsjkx.210300086

• 计算机软件 • 上一篇    下一篇

基于次模函数最大化的测试用例集约简

文进, 张星宇, 沙朝锋, 刘艳君   

  1. 复旦大学计算机科学技术学院 上海200433
  • 收稿日期:2021-03-09 修回日期:2021-05-07 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 沙朝锋(cfsha@fudan.edu.cn)
  • 作者简介:wenjin_fudan@foxmail.com
  • 基金资助:
    国家重点研发计划(2018YFB0904503)

Test Suite Reduction via Submodular Function Maximization

WEN Jin, ZHANG Xing-yu, SHA Chao-feng, LIU Yan-jun   

  1. School of Computer Science,Fudan University,Shanghai 200433,China
  • Received:2021-03-09 Revised:2021-05-07 Online:2021-12-15 Published:2021-11-26
  • About author:WEN Jin,born in 1996,postgraduate.His main research interests include natu-ral language processing and software engineering.
    SHA Chao-feng,born in 1976,Ph.D,associate professor.His main research interests include machine learning and data mining,natural language processing.
  • Supported by:
    National Key Research and Development Program of China(2018YFB0904503).

摘要: 随着软件回归测试规模的不断增大和成本的不断增加,测试用例集约简对于提高软件的回归测试效率显得愈发重要。在选取测试用例子集时,需考虑该子集的代表性和多样性,并采用一个有效的算法来求解。针对该测试用例集约简问题,文中提出了一种基于次模函数最大化的算法SubTSR。尽管引入的离散优化问题是NP-hard问题,但文中利用其目标函数的次模性,采用启发式贪心搜索,求得有近似度保证的次优解。在15个数据集上对SubTSR算法与其他测试用例集约简算法展开实验,针对平均错误检出率、错误检测损失率、首次错误检出位等指标,尝试改变LDA处理中的主题个数以及衡量测试用例相似度的距离,以验证SubTSR算法的有效性。实验结果表明,SubTSR算法在错误检出性能上较其他算法有着较大提升,且在多个数据集上的表现保持相对稳定。在主题个数变化引起文本表示变化时,采用曼哈顿距离的SubTSR算法的性能相较其他算法仍能保持相对稳定。

关键词: 软件测试, 测试用例集约简, 错误检测, 主题模型, 次模函数

Abstract: As regression testing size and cost increase,test suite reduction becomes more important to promote its efficiency.Du-ring the selection of test suite subset,we are supposed to consider the representativeness and diversity of subset,and apply an effective algorithm to solve it.Aimed at test suite reduction,a submodular function maximization based algorithm,SubTSR,is proposed in this paper.Although the introduced discrete optimization problem is an NP-hard problem,the heuristic greedy search is used in this paper to find the suboptimal solution with approximation guarantee by exploiting the submodularity of the objective function.To validate the effectiveness of the SubTSR algorithm,the SubTSR algorithm is experimented on fifteen datasets with changes of topic count in LDA and distance for similarity measurement,and compared with other test suite reduction algorithms about the average percentage of fault-detection,fault detection loss rate,first faulty test's index and other metrics.The experiment results show that the SubTSR algorithm has significant improvement in fault detection performance compared with other algorithms,and its performance remains relatively stable on different datasets.Under the text representation change due to topic count change,the SubTSR combined with Manhattan distance keeps relatively stable compared with other algorithms.

Key words: Software testing, Test suite reduction, Fault detection, Topic model, Submodular function

中图分类号: 

  • TP391
[1]WONG W E,HORGAN J R,LONDON S,et al.A study of effective regression testing in practice[C]//Eighth International Symposium on Software Reliability Engineering.IEEE Compu-ter Society,1997:264-274.
[2]ROTHERMEL G,UNTCH R H,CHU C,et al.Prioritizing test cases for regression testing[J].IEEE Trans.Software Eng.,2001,27(10):929-948.
[3]YADAV D K,DUTTA S K.Test case prioritization using clustering approach for object oriented software[J].International Journal of Information System Modeling and Design (IJISMD),2019,10(3):92-109.
[4]KANDIL P,MOUSSA S,BADR N.Cluster-based test cases prio- ritization and selection technique for agile regression testing[J].Journal of Software:Evolution and Process,2017,29(6):e1794.
[5]KHAN S U R,LEE S P,JAVAID N,et al.A systematic review on test suite reduction:approaches,experiment's quality evaluation,and guidelines[J].IEEE Access,2018,6:11816-11841.
[6]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].J.Mach.Learn.Res.,2003,3:993-1022.
[7]WANG R,LI Z,JIANG S,et al.Regression Test Case Prioritization Based on Fixed Size Candidate Set ART Algorithm[J].International Journal of Software Engineering and Knowledge Engineering,2020,30(3):291-320.
[8]SRIKANTH H,WILLIAMS L,OSBORNE J.System test case prioritization of new and regression test cases[C]//2005 International Symposium on Empirical Software Engineering(ISESE).IEEE Computer Society,2005:64-73.
[9]ELBAUM S,ROTHERMEL G,KANDURI S,et al.Selecting a cost-effective test case prioritization technique[J].Software Quality Journal,2004,12(3):185-210.
[10]THOMAS S W,HEMMATI H,HASSAN A E,et al.Static test case prioritization using topic models[J].Empir.Softw.Eng.,2014,19(1):182-212.
[11]MIRANDA B,CRUCIANI E,VERDECCHIA R,et al.FAST approaches to scalable similarity-based test case prioritization[C]//Proceedings of the 40th International Conference on Software Engineering.ACM,2018:222-232.
[12]CRUCIANI E,MIRANDA B,VERDECCHIA R,et al.Scalable approaches for test suite reduction[C]//Proceedings of the 41st International Conference on Software Engineering.IEEE/ACM,2019:419-429.
[13]HEMMATI H,ARCURI A,BRIAND L.Achieving scalable model-based testing through test case diversity[J].ACM Transac-tions on Software Engineering and Methodology (TOSEM),2013,22(1):1-42.
[14]MONDAL D,HEMMATI H,DUROCHER S.Exploring test suite diversification and code coverage in multi-objective test case selection[C]//8th IEEE International Conference on Software Testing,Verification and Validation(ICST 2015).Graz,Austria:IEEE Computer Society,2015:1-10.
[15]XIA C Y,WANG X Y,ZHANG Y.Test Case Prioritization Based on Multi-objective Optimization[J].Computer Science,2020,47(6):44-49.
[16]XIAO L,CHEN R S,MIAO H K,et al.Test Case Prioritization Combining Clustering Approach and Fault Prediction[J].Computer Science,2021,48(5):99-108.
[17]KRAUSE A,GOLOVIN D.Submodular function maximization [M]//Tractability:Practical Approaches to Hard Problems.Cambridge University Press,2014:71-104.
[18]NEMHAUSER G L,WOLSEY L A,FISHER M L.An analysis of approximations for maximizing submodular set functions-I[J].Math.Program.,1978,14(1):265-294.
[19]WEI K,IYER R,BILMES J.Submodularity in Data Subset Selection and Active Learning[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:PMLR,2015:1954-1963.
[20]DO H,ELBAUM S,ROTHERMEL G.Supporting controlled experimentation with testing techniques:an infrastructure and its potential impact[J].Empirical Software Engineering,2005,10(4):405-435.
[21]JUST R,JALALI D,ERNST M D.Defects4J:a database of existing faults to enable controlled testing studies for Java programs[C]//International Symposium on Software Testing and Analysis.ACM,2014:437-440.
[1] 滕俊元, 高猛, 郑小萌, 江云松. 噪声可容忍的软件缺陷预测特征选择方法[J]. 计算机科学, 2021, 48(12): 131-139.
[2] 孙昌爱, 张守峰, 朱维忠. 一种基于变异分析的BPEL程序故障定位技术[J]. 计算机科学, 2021, 48(1): 301-307.
[3] 夏春艳, 王兴亚, 张岩. 基于多目标优化的测试用例优先级排序方法[J]. 计算机科学, 2020, 47(6): 38-43.
[4] 周波. 融合语义模型的二分网络推荐算法[J]. 计算机科学, 2020, 47(11A): 482-485.
[5] 王涵, 夏鸿斌. LDA模型和列表排序混合的协同过滤推荐算法[J]. 计算机科学, 2019, 46(9): 216-222.
[6] 居亚亚, 杨璐, 严建峰. 基于动态权重的LDA算法[J]. 计算机科学, 2019, 46(8): 260-265.
[7] 张蕾,蔡明. 基于主题融合和关联规则挖掘的图像标注[J]. 计算机科学, 2019, 46(7): 246-251.
[8] 薄莉莉, 姜淑娟, 张艳梅, 王兴亚, 于巧. 并发缺陷检测技术研究进展[J]. 计算机科学, 2019, 46(5): 13-20.
[9] 李志博,李清宝,于磊,侯雪梅. 基于划分的自适应随机测试综述[J]. 计算机科学, 2019, 46(3): 19-29.
[10] 范道远, 孙吉红, 王炜, 涂吉屏, 何欣. 融合文本与分类信息的重复缺陷报告检测方法[J]. 计算机科学, 2019, 46(12): 192-200.
[11] 贾宁, 郑纯军. 基于注意力LSTM的音乐主题推荐模型[J]. 计算机科学, 2019, 46(11A): 230-235.
[12] 王蓁蓁, 刘嘉. 基于校正因子的随机TBFL方法[J]. 计算机科学, 2019, 46(11): 161-167.
[13] 余圆圆, 巢文涵, 何跃鹰, 李舟军. 基于双语主题模型和双语词向量的跨语言知识链接[J]. 计算机科学, 2019, 46(1): 238-244.
[14] 张小川, 余林峰, 张宜浩. 基于LDA的多特征融合的短文本相似度计算[J]. 计算机科学, 2018, 45(9): 266-270.
[15] 包晓安, 熊子健, 张唯, 吴彪, 张娜. 一种基于改进遗传算法的路径测试用例生成方法[J]. 计算机科学, 2018, 45(8): 174-178.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 贾熹滨,叶颖婕,陈军成. 基于关联规则的交通事故影响因素的挖掘[J]. 计算机科学, 2018, 45(6A): 447 -452 .
[2] 陈海燕. 基于多群智能算法的云计算任务调度策略[J]. 计算机科学, 2014, 41(Z6): 83 -86 .
[3] 张屹. 基于干道绿波效应协同策略的信号配时模糊控制[J]. 计算机科学, 2014, 41(Z6): 80 -82 .
[4] 郭西风,祝恩,周思航,申小龙,殷建平. 一种基于交点权重图的指纹焦点检测方法[J]. 计算机科学, 2016, 43(2): 35 -37 .
[5] 赵晓非,黄志球. 基于描述逻辑的CWM元数据冲突的检测和消解[J]. 计算机科学, 2010, 37(11): 166 -171 .
[6] 郭佳. 基于改进的人工神经网络对存储系统性能进行预测的方法[J]. 计算机科学, 2019, 46(6A): 52 -55 .
[7] 王生武,陈红梅. 基于粗糙集和改进鲸鱼优化算法的特征选择方法[J]. 计算机科学, 2020, 47(2): 44 -50 .
[8] 董明刚, 弓佳明, 敬超. 基于谱聚类的多目标进化社区发现算法研究[J]. 计算机科学, 2020, 47(6A): 461 -466 .
[9] 胡德凤, 张晨曦, 汪世涛, 赵钦佩, 李江峰. 基于深度信号处理和堆叠残差GRU的刀具磨损智能预测模型[J]. 计算机科学, 2021, 48(6): 175 -183 .
[10] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .