计算机科学 ›› 2021, Vol. 48 ›› Issue (5): 75-85.doi: 10.11896/jsjkx.200900062

• 计算机软件* 上一篇    下一篇

基于类粒度的克隆代码群稳定性实证研究

张久杰1, 陈超1, 聂宏轩1, 夏玉芹1, 张丽萍2, 马占飞1   

  1. 1 包头师范学院计算机科学与技术系 内蒙古 包头014030
    2 内蒙古师范大学计算机科学技术学院 呼和浩特010022
  • 收稿日期:2020-09-08 修回日期:2020-12-02 出版日期:2021-05-15 发布日期:2021-05-09
  • 通讯作者: 夏玉芹(xiateacher@163.com)
  • 基金资助:
    国家自然科学基金(61762071, 61462071);内蒙古自治区自然科学基金(2014MS0613,2015MS0606,2016MS0614,2019MS06037)

Empirical Study on Stability of Clone Code Sets Based on Class Granularity

ZHANG Jiu-jie1, CHEN Chao1, NIE Hong-xuan1, XIA Yu-qin1, ZHANG Li-ping2, MA Zhan-fei1   

  1. 1 Department of Computer Science & Technology,Baotou Teachers' College,Baotou,Inner Mongolia 014030,China
    2 School of Computer Science & Technology,Inner Mongolia Normal University,Hohhot 010022,China
  • Received:2020-09-08 Revised:2020-12-02 Online:2021-05-15 Published:2021-05-09
  • About author:ZHANG Jiu-jie,born in 1990,master,is a member of China Computer Federation.His main research interests include software engineering,software maintenance and evolution,program source code analysis,clone code detection and management.(zhangjiujie@bttc.edu.cn)
    XIA Yu-qin,born in 1973,master,associate professor.Her main research interests include software engineering,artificial intelligence and computer education.
  • Supported by:
    National Natural Science Foundation of China(61762071,61462071) and Natural Science Foundation of the Inner Mongolia Autonomous Region,China(2014MS0613,2015MS0606,2016MS0614,2019MS06037).

摘要: 克隆代码研究与软件工程中的各类问题密切相关。现有的克隆代码稳定性研究主要集中于克隆代码与非克隆代码的比较以及不同克隆代码类型之间的比较,少有研究对克隆代码的稳定性与克隆群所分布的面向对象类进行相关分析。基于面向对象类的粒度进行了克隆群稳定性实证研究,设计了4项与克隆群稳定性相关的研究问题,围绕这些研究问题,将克隆群分为类内、类间和混合3组,并基于4种视角下的9个演化模式进行了克隆群稳定性的对比分析。首先,检测软件系统所有子版本中的克隆代码,识别并标注所有克隆代码片段所属的类信息;其次,基于克隆片段映射方法完成相邻版本间克隆群的演化映射和演化模式的识别与标注,并将映射和标注结果合并为克隆代码演化谱系;然后,在不同视角下,针对3组克隆群进行稳定性计算;最后,根据实验结果对比分析了3组克隆群的稳定性差异。在7款面向对象开源软件系统总共近7 700个版本上进行的克隆群稳定性实验结果表明:约60%的类内克隆群的生命周期率达到50%及以上,类间克隆和混合克隆群的生命周期率达到50%及以上的占比均约为35%;类内克隆群发生变化的次数最少,类间克隆群发生合并、分枝和延迟修复演化模式的次数相对略多,混合克隆群发生片段减少、内容一致变化和不一致变化的次数最多。总体而言,类内克隆群的稳定性表现最佳,混合克隆群在演化中可能需要重点跟踪或优先重构。克隆代码稳定性分析方法及实验结论将为克隆代码的跟踪、维护以及重构等克隆管理相关软件活动提供有力的参考和支持。

关键词: 混合克隆, 克隆代码, 类间克隆, 类内克隆, 软件维护, 软件演化, 稳定性

Abstract: Researching on clone code is closely related to various problems in software engineering.The existing researches and studies on stability of clone code mainly focus on comparisons between clone code and non-clone code,or between different types of clone code.Rare studies consider the object-oriented classes in which clone sets distribute.This paper presents a comprehensive empirical study on stability of clone sets based on object-oriented class granularity.This paper frames four study problems about the stability of clone sets.Around these particular problems,all clone sets are categorized into three groups,intra-class clone sets,inter-class clone sets and hybrid-class clone sets.And stability of them is compared and analyzed by 9 evolution patterns from 4 perspectives during the process of software evolution.First of all,clone code fragments in all revisions of subject systems are detected and tagged with object-oriented classes where they distribute in.Next,clone sets between adjacent revisions are mapped based on mapping clone fragments,and evolution patterns of clone sets can be recognized and tagged.After that,clone genealogy is constructed by combing the results of mapping relations and evolution patterns,and then stability of three groups of clone sets is calculated from different perspectives.Eventually,differences of three groups are compared and analyzed.According to the experimental results on 7 700 revisions of seven diverse object-oriented subject systems,about 60% of intra-class clone sets have a life cycle more than half of the total number of reversions,the percentage of inter-class clone sets and hybrid clone sets that have a life cycle rate of 50% or more are both close to 35%.Comparatively speaking,among three kinds of clone sets,the frequency of changes within intra-class clone sets is the lowest.Also,there is a bit more merging,branching and late propagation evolution patterns in inter-class clone sets.And the frequency of fragments deletions,consistent changes and inconsistent changes is the highest in hybrid-class clone sets.Overall,stability of intra-class clone sets is the best,hybrid-class clone sets should be given a higherpriority to tracing or refactoring in the process of software evolution.The clone code stability analysis methods and findings from this work will provide a strong reference and support for clone code maintenance,tracking,refactoring and other cloning management related software activities.

Key words: Clone code, Hybrid-class clone, Inter-class clone, Intra-class-clone, Software evolution, Software maintenance, Stability

中图分类号: 

  • TP311.5
[1]ROY C K,ZIBRAN M F,KOSCHKE R.The Vision of Software Clone Management:Past,Present,and Future (Keynote Paper)[C]//2014 Software Evolution Week-IEEE Conference on Software Maintenance,Reengineering,and Reverse Engineering.IEEE,2014:18-33.
[2]KIM M,SAZAWAL V,NOTKIN D,et al.An Empirical Study of Code Clone Genealogies[C]//Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering.ACM SIGSOFT,2005:187-196.
[3]WALKER A,CERNY T,SONG E.Open-Source Tools andBenchmarks for Code-Clone Detection:Past,Present,and Future Trends[J].ACM SIGAPP Applied Computing Review,2020,19(4):28-39.
[4]FARAM F,SAINI V,YAND D,et al.On Precision of Code Clone Detection Tools[C]//2019 IEEE 26th International Conference on Software Analysis,Evolution and Reengineering (SANER).IEEE,2019:84-94.
[5]AIN Q U,BUTT W H,ANWAR M W,et al.A Systematic Review on Code Clone Detection[J].IEEE Access,2019,7:86121-86144.
[6]ROY C K,CORDY J R.NICAD:Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization[C]// Proceedings of the 16th IEEE International Conference on Program Comprehension.IEEE,2008:172-181.
[7]SAHA R K,ROY C K,SCHNEIDER K A.gCad:A Near-Miss Clone Genealogy Extractor to Support Clone Evolution Analysis[C]//2013 IEEE International Conference on Software Maintenance.IEEE,2013:488-491.
[8]BAKOTA T.Tracking the Evolution of Code Clones[C]//International Conference on Current Trends in Theory and Practice of Computer Science.Berlin,Heidelberg:Springer,2011:86-98.
[9]BARBOUR L,KHOMH F,ZOU Y.Late Propagation in Soft-ware Clones[C]//2011 27th IEEE International Conference on Software Maintenance (ICSM).IEEE,2011:273-282.
[10]ZHANG J J,ZHAI Y,WANG C H,et al.Evolution Pattern Recognition and Genealogy Construction Based on Clone Mapping of Versions[J].Journal of Computer Applications,2016(7):2021-2030.
[11]ZIBRAN M F,ROY C K.Conflict-Aware Optimal Scheduling of Code Clone Refactoring:A Constraint Programming Approach[C]// Proceedings of the 19th International Conference on Program Comprehension.IEEE,2011:266-269.
[12]KRINKE J.Is Cloned Code More Stable than Non-Cloned Code?[C]// Proceedings of the 8th IEEE International Working Conference on Source Code Analysis and Manipulation.IEEE,2008:57-66.
[13]KRINKE J.Is Cloned Code Older than Non-Cloned Code?[C]//Proceedings of the 5th International Workshop on Software Clones.ACM,2011:28-33.
[14]GÖDE N,HARDER J.Clone Stability[C]// Proceedings of the 15th European Conference on Software Maintenance and Reengineering.IEEE,2011:65-74.
[15]MONDAL M,ROY C K,SCHNEIDER K A.An EmpiricalStudy on Clone Stability[J].ACM SIGAPP Applied Computing Review,2012,12(3):20-36.
[16]LOZANO A,WERMELINGER M.Tracking Clones' Imprint[C]//Proceedings of the 4th International Workshop on Software Clones.IEEE,2010:65-72.
[17]LOZANO A,WERMELINGER M.Assessing the Effect ofClones on Changeability[C]//International Conference on Software Maintenance.IEEE,2008:227-236.
[18]MONDAL M,ROY C K,SCHNEIDER K A.Bug propagation through code cloning:An empirical study[C]//2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).IEEE,2017:227-237.
[19]RAHMAN M S,ROY C K.On the Relationships Between Stability and Bug-Proneness of Code Clones:An Empirical Study[C]//Proceedings of the 17th International Working Conference on Source Code Analysis and Manipulation (SCAM).IEEE,2017:131-140.
[20]BARBOUR L,KHOMH F,ZOU Y.An Empirical Study ofFaults in Late Propagation Clone Genealogies[J].Journal of Software:Evolution and Process,2013,25(11):1139-1165.
[21]SAJNANI H,SAINI V,LOPES C V.A Comparative Study of Bug Patterns in Java Cloned and Non-Cloned Code[C]// Proceedings of the 14th International Working Conference on Source Code Analysis and Manipulation.IEEE,2014:21-30.
[22]MONDAL M,ROY C K,SCHNEIDER K A.Bug-Proneness and Late Propagation Tendency of Code Clones:A Comparative Study on Different Clone Types[J].Journal of Systems and Software,2018,144:41-59.
[23]ELISH M O.On the Association Between Code Cloning and Fault-Proneness:An Empirical Investigation[C]//2017 Computing Conference.IEEE,2017:928-935.
[24]LIN Y,XING Z,PENG X,et al.ClonePedia:Summarizing Code Clones by Common Syntactic Context for Software Maintenance[C]//2014 IEEE International Conference on Software Maintenance and Evolution.IEEE,2014:341-350.
[25]MANN H B,WHITNEY D R.On a Test of Whether One of Two Random Variables is Stochastically Larger Than the Other[J].The Annals of Mathematical Statistics,1947,18(1):50-60.
[26]WALLACE D L.Simplified Beta-Approximations to theKruskal-Wallis H Test[J].Journal of the American Statistical Association,1959,54(285):225-230.
[1] 郑文萍, 刘美麟, 杨贵.
一种基于节点稳定性和邻域相似性的社区发现算法
Community Detection Algorithm Based on Node Stability and Neighbor Similarity
计算机科学, 2022, 49(9): 83-91. https://doi.org/10.11896/jsjkx.220400146
[2] 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚.
融合双向门控循环单元和注意力机制的软件自承认技术债识别方法
Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism
计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[3] 王继文, 吴毅坚, 彭鑫.
基于演化和语义特征的上帝类检测方法
Approach of God Class Detection Based on Evolutionary and Semantic Features
计算机科学, 2021, 48(12): 59-66. https://doi.org/10.11896/jsjkx.210100077
[4] 欧阳鹏, 陆璐, 张凡龙, 邱少健.
基于迁移学习和过采样技术的跨项目克隆代码一致性维护需求预测
Cross-project Clone Consistency Prediction via Transfer Learning and Oversampling Technology
计算机科学, 2020, 47(9): 10-16. https://doi.org/10.11896/jsjkx.200400041
[5] 王萌, 丁志军.
一种新的设备指纹特征选择及模型构建方法
New Device Fingerprint Feature Selection and Model Construction Method
计算机科学, 2020, 47(7): 257-262. https://doi.org/10.11896/jsjkx.190900107
[6] 何鹏, 喻绿君.
面向群体协作开发的开源软件峭壁分析
Analysis of Open Source Software Cliff Walls for Group Collaborative Development
计算机科学, 2020, 47(6): 51-58. https://doi.org/10.11896/jsjkx.190300140
[7] 朱林立, 华钢, 高炜.
决定图框架下本体学习算法的稳定性分析
Stability Analysis of Ontology Learning Algorithm in Decision Graph Setting
计算机科学, 2020, 47(5): 43-50. https://doi.org/10.11896/jsjkx.200100129
[8] 张静宣, 江贺.
代码标识符归一化研究现状及发展趋势
Research Status and Development Trend of Identifier Normalization
计算机科学, 2020, 47(3): 1-4. https://doi.org/10.11896/jsjkx.200200009
[9] 周畅,陆慧梅,向勇,吴竞邦.
区块链在车载自组网中的应用研究及展望
Survey on Application of Blockchain in VANET
计算机科学, 2020, 47(2): 213-220. https://doi.org/10.11896/jsjkx.190600001
[10] 钟林辉, 扶丽娟, 叶海涛, 齐杰, 徐静.
软件演化历史的逆向工程生成方法研究
Study on Reverse Engineering Generation Method of Software Evolution History
计算机科学, 2020, 47(11A): 549-556. https://doi.org/10.11896/jsjkx.200200067
[11] 折蓉蓉, 张丽萍.
基于软件演化历史识别并推荐重构克隆的方法
Method for Identifying and Recommending Reconstructed Clones Based on Software Evolution History
计算机科学, 2019, 46(8): 224-232. https://doi.org/10.11896/j.issn.1002-137X.2019.08.037
[12] 潘浩, 郑巍, 张紫枫, 芦超群.
软件网络分形结构特征研究
Study on Fractal Features of Software Networks
计算机科学, 2019, 46(2): 166-170. https://doi.org/10.11896/j.issn.1002-137X.2019.02.026
[13] 陈春涛, 陈优广.
基于影响空间的稳健密度峰值聚类算法
Influence Space Based Robust Fast Search and Density Peak Clustering Algorithm
计算机科学, 2019, 46(11): 216-221. https://doi.org/10.11896/jsjkx.181001846
[14] 唐倩文, 陈良育.
基于复杂网络理论的Java开源系统演化分析
Analysis of Java Open Source System Evolution Based on Complex Network Theory
计算机科学, 2018, 45(8): 166-173. https://doi.org/10.11896/j.issn.1002-137X.2018.08.030
[15] 朱江, 雷云, 王雁.
认知无线传感器网络中基于稳定性的能效路由协议
Stability Based Energy-efficient Routing Protocol in Cognitive Wireless Sensor Networks
计算机科学, 2018, 45(11): 97-102. https://doi.org/10.11896/j.issn.1002-137X.2018.11.014
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!