计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 75-84.doi: 10.11896/jsjkx.210300086

• 计算机软件 • 上一篇    下一篇

基于次模函数最大化的测试用例集约简

文进, 张星宇, 沙朝锋, 刘艳君   

  1. 复旦大学计算机科学技术学院 上海200433
  • 收稿日期:2021-03-09 修回日期:2021-05-07 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 沙朝锋(cfsha@fudan.edu.cn)
  • 作者简介:wenjin_fudan@foxmail.com
  • 基金资助:
    国家重点研发计划(2018YFB0904503)

Test Suite Reduction via Submodular Function Maximization

WEN Jin, ZHANG Xing-yu, SHA Chao-feng, LIU Yan-jun   

  1. School of Computer Science,Fudan University,Shanghai 200433,China
  • Received:2021-03-09 Revised:2021-05-07 Online:2021-12-15 Published:2021-11-26
  • About author:WEN Jin,born in 1996,postgraduate.His main research interests include natu-ral language processing and software engineering.
    SHA Chao-feng,born in 1976,Ph.D,associate professor.His main research interests include machine learning and data mining,natural language processing.
  • Supported by:
    National Key Research and Development Program of China(2018YFB0904503).

摘要: 随着软件回归测试规模的不断增大和成本的不断增加,测试用例集约简对于提高软件的回归测试效率显得愈发重要。在选取测试用例子集时,需考虑该子集的代表性和多样性,并采用一个有效的算法来求解。针对该测试用例集约简问题,文中提出了一种基于次模函数最大化的算法SubTSR。尽管引入的离散优化问题是NP-hard问题,但文中利用其目标函数的次模性,采用启发式贪心搜索,求得有近似度保证的次优解。在15个数据集上对SubTSR算法与其他测试用例集约简算法展开实验,针对平均错误检出率、错误检测损失率、首次错误检出位等指标,尝试改变LDA处理中的主题个数以及衡量测试用例相似度的距离,以验证SubTSR算法的有效性。实验结果表明,SubTSR算法在错误检出性能上较其他算法有着较大提升,且在多个数据集上的表现保持相对稳定。在主题个数变化引起文本表示变化时,采用曼哈顿距离的SubTSR算法的性能相较其他算法仍能保持相对稳定。

关键词: 测试用例集约简, 次模函数, 错误检测, 软件测试, 主题模型

Abstract: As regression testing size and cost increase,test suite reduction becomes more important to promote its efficiency.Du-ring the selection of test suite subset,we are supposed to consider the representativeness and diversity of subset,and apply an effective algorithm to solve it.Aimed at test suite reduction,a submodular function maximization based algorithm,SubTSR,is proposed in this paper.Although the introduced discrete optimization problem is an NP-hard problem,the heuristic greedy search is used in this paper to find the suboptimal solution with approximation guarantee by exploiting the submodularity of the objective function.To validate the effectiveness of the SubTSR algorithm,the SubTSR algorithm is experimented on fifteen datasets with changes of topic count in LDA and distance for similarity measurement,and compared with other test suite reduction algorithms about the average percentage of fault-detection,fault detection loss rate,first faulty test's index and other metrics.The experiment results show that the SubTSR algorithm has significant improvement in fault detection performance compared with other algorithms,and its performance remains relatively stable on different datasets.Under the text representation change due to topic count change,the SubTSR combined with Manhattan distance keeps relatively stable compared with other algorithms.

Key words: Fault detection, Software testing, Submodular function, Test suite reduction, Topic model

中图分类号: 

  • TP391
[1]WONG W E,HORGAN J R,LONDON S,et al.A study of effective regression testing in practice[C]//Eighth International Symposium on Software Reliability Engineering.IEEE Compu-ter Society,1997:264-274.
[2]ROTHERMEL G,UNTCH R H,CHU C,et al.Prioritizing test cases for regression testing[J].IEEE Trans.Software Eng.,2001,27(10):929-948.
[3]YADAV D K,DUTTA S K.Test case prioritization using clustering approach for object oriented software[J].International Journal of Information System Modeling and Design (IJISMD),2019,10(3):92-109.
[4]KANDIL P,MOUSSA S,BADR N.Cluster-based test cases prio- ritization and selection technique for agile regression testing[J].Journal of Software:Evolution and Process,2017,29(6):e1794.
[5]KHAN S U R,LEE S P,JAVAID N,et al.A systematic review on test suite reduction:approaches,experiment's quality evaluation,and guidelines[J].IEEE Access,2018,6:11816-11841.
[6]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].J.Mach.Learn.Res.,2003,3:993-1022.
[7]WANG R,LI Z,JIANG S,et al.Regression Test Case Prioritization Based on Fixed Size Candidate Set ART Algorithm[J].International Journal of Software Engineering and Knowledge Engineering,2020,30(3):291-320.
[8]SRIKANTH H,WILLIAMS L,OSBORNE J.System test case prioritization of new and regression test cases[C]//2005 International Symposium on Empirical Software Engineering(ISESE).IEEE Computer Society,2005:64-73.
[9]ELBAUM S,ROTHERMEL G,KANDURI S,et al.Selecting a cost-effective test case prioritization technique[J].Software Quality Journal,2004,12(3):185-210.
[10]THOMAS S W,HEMMATI H,HASSAN A E,et al.Static test case prioritization using topic models[J].Empir.Softw.Eng.,2014,19(1):182-212.
[11]MIRANDA B,CRUCIANI E,VERDECCHIA R,et al.FAST approaches to scalable similarity-based test case prioritization[C]//Proceedings of the 40th International Conference on Software Engineering.ACM,2018:222-232.
[12]CRUCIANI E,MIRANDA B,VERDECCHIA R,et al.Scalable approaches for test suite reduction[C]//Proceedings of the 41st International Conference on Software Engineering.IEEE/ACM,2019:419-429.
[13]HEMMATI H,ARCURI A,BRIAND L.Achieving scalable model-based testing through test case diversity[J].ACM Transac-tions on Software Engineering and Methodology (TOSEM),2013,22(1):1-42.
[14]MONDAL D,HEMMATI H,DUROCHER S.Exploring test suite diversification and code coverage in multi-objective test case selection[C]//8th IEEE International Conference on Software Testing,Verification and Validation(ICST 2015).Graz,Austria:IEEE Computer Society,2015:1-10.
[15]XIA C Y,WANG X Y,ZHANG Y.Test Case Prioritization Based on Multi-objective Optimization[J].Computer Science,2020,47(6):44-49.
[16]XIAO L,CHEN R S,MIAO H K,et al.Test Case Prioritization Combining Clustering Approach and Fault Prediction[J].Computer Science,2021,48(5):99-108.
[17]KRAUSE A,GOLOVIN D.Submodular function maximization [M]//Tractability:Practical Approaches to Hard Problems.Cambridge University Press,2014:71-104.
[18]NEMHAUSER G L,WOLSEY L A,FISHER M L.An analysis of approximations for maximizing submodular set functions-I[J].Math.Program.,1978,14(1):265-294.
[19]WEI K,IYER R,BILMES J.Submodularity in Data Subset Selection and Active Learning[C]//Proceedings of the 32nd International Conference on Machine Learning.Lille,France:PMLR,2015:1954-1963.
[20]DO H,ELBAUM S,ROTHERMEL G.Supporting controlled experimentation with testing techniques:an infrastructure and its potential impact[J].Empirical Software Engineering,2005,10(4):405-435.
[21]JUST R,JALALI D,ERNST M D.Defects4J:a database of existing faults to enable controlled testing studies for Java programs[C]//International Symposium on Software Testing and Analysis.ACM,2014:437-440.
[1] 滕俊元, 高猛, 郑小萌, 江云松.
噪声可容忍的软件缺陷预测特征选择方法
Noise Tolerable Feature Selection Method for Software Defect Prediction
计算机科学, 2021, 48(12): 131-139. https://doi.org/10.11896/jsjkx.201000168
[2] 孙昌爱, 张守峰, 朱维忠.
一种基于变异分析的BPEL程序故障定位技术
Mutation Based Fault Localization Technique for BPEL Programs
计算机科学, 2021, 48(1): 301-307. https://doi.org/10.11896/jsjkx.200900051
[3] 夏春艳, 王兴亚, 张岩.
基于多目标优化的测试用例优先级排序方法
Test Case Prioritization Based on Multi-objective Optimization
计算机科学, 2020, 47(6): 38-43. https://doi.org/10.11896/jsjkx.191100113
[4] 周波.
融合语义模型的二分网络推荐算法
Bipartite Network Recommendation Algorithm Based on Semantic Model
计算机科学, 2020, 47(11A): 482-485. https://doi.org/10.11896/jsjkx.200400028
[5] 王涵, 夏鸿斌.
LDA模型和列表排序混合的协同过滤推荐算法
Collaborative Filtering Recommendation Algorithm Mixing LDA Model and List-wise Model
计算机科学, 2019, 46(9): 216-222. https://doi.org/10.11896/j.issn.1002-137X.2019.09.032
[6] 居亚亚, 杨璐, 严建峰.
基于动态权重的LDA算法
LDA Algorithm Based on Dynamic Weight
计算机科学, 2019, 46(8): 260-265. https://doi.org/10.11896/j.issn.1002-137X.2019.08.043
[7] 张蕾,蔡明.
基于主题融合和关联规则挖掘的图像标注
Image Annotation Based on Topic Fusion and Frequent Patterns Mining
计算机科学, 2019, 46(7): 246-251. https://doi.org/10.11896/j.issn.1002-137X.2019.07.037
[8] 薄莉莉, 姜淑娟, 张艳梅, 王兴亚, 于巧.
并发缺陷检测技术研究进展
Research Progress on Techniques for Concurrency Bug Detection
计算机科学, 2019, 46(5): 13-20. https://doi.org/10.11896/j.issn.1002-137X.2019.05.002
[9] 李志博,李清宝,于磊,侯雪梅.
基于划分的自适应随机测试综述
Survey on Adaptive Random Testing by Partitioning
计算机科学, 2019, 46(3): 19-29. https://doi.org/10.11896/j.issn.1002-137X.2019.03.003
[10] 范道远, 孙吉红, 王炜, 涂吉屏, 何欣.
融合文本与分类信息的重复缺陷报告检测方法
Detection Method of Duplicate Defect Reports Fusing Text and Categorization Information
计算机科学, 2019, 46(12): 192-200. https://doi.org/10.11896/jsjkx.181102232
[11] 贾宁, 郑纯军.
基于注意力LSTM的音乐主题推荐模型
Model of Music Theme Recommendation Based on Attention LSTM
计算机科学, 2019, 46(11A): 230-235.
[12] 王蓁蓁, 刘嘉.
基于校正因子的随机TBFL方法
Stochastic TBFL Approach Based on Calibration Factor
计算机科学, 2019, 46(11): 161-167. https://doi.org/10.11896/jsjkx.191100503C
[13] 余圆圆, 巢文涵, 何跃鹰, 李舟军.
基于双语主题模型和双语词向量的跨语言知识链接
Cross-language Knowledge Linking Based on Bilingual Topic Model and Bilingual Embedding
计算机科学, 2019, 46(1): 238-244. https://doi.org/10.11896/j.issn.1002-137X.2019.01.037
[14] 张小川, 余林峰, 张宜浩.
基于LDA的多特征融合的短文本相似度计算
Multi-feature Fusion for Short Text Similarity Calculation Based on LDA
计算机科学, 2018, 45(9): 266-270. https://doi.org/10.11896/j.issn.1002-137X.2018.09.044
[15] 包晓安, 熊子健, 张唯, 吴彪, 张娜.
一种基于改进遗传算法的路径测试用例生成方法
Approach for Path-oriented Test Cases Generation Based on Improved Genetic Algorithm
计算机科学, 2018, 45(8): 174-178. https://doi.org/10.11896/j.issn.1002-137X.2018.08.031
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!