计算机科学 ›› 2019, Vol. 46 ›› Issue (8): 224-232.doi: 10.11896/j.issn.1002-137X.2019.08.037

• 软件与数据库技术 • 上一篇    下一篇

基于软件演化历史识别并推荐重构克隆的方法

折蓉蓉, 张丽萍   

  1. (内蒙古师范大学计算机与信息工程学院 呼和浩特010022)
  • 收稿日期:2018-06-26 出版日期:2019-08-15 发布日期:2019-08-15
  • 通讯作者: 张丽萍(1974-),女,硕士,教授,CCF会员,主要研究方向为软件工程、软件分析,E-mail:cieczlp@imnu.edu.cn
  • 作者简介:折蓉蓉(1991-),女,硕士生,主要研究方向为软件工程、软件分析
  • 基金资助:
    国家自然科学基金资助项目(61462071),内蒙古自然科学基金资助项目(2018MS06009),内蒙古教育厅资助项目(NJZY17049),内蒙古师范大学科研基金项目(2016ZRYB003)

Method for Identifying and Recommending Reconstructed Clones Based on Software Evolution History

SHE Rong-rong, ZHANG Li-ping   

  1. (College of Computer and Information Engineering,Inner Mongolia Normal University,Hohhot 010022,China)
  • Received:2018-06-26 Online:2019-08-15 Published:2019-08-15

摘要: 现有克隆代码重构研究局限于单一版本的静态分析,忽略了克隆代码的演化过程,这导致在克隆代码重构决策方面缺乏有效的方法。因此文中首先从克隆检测、克隆映射、克隆家系以及软件维护日志管理系统中提取与克隆代码密切相关的演化历史信息;其次识别出需要重构的克隆代码,同时识别出跟踪的克隆代码,然后提取与重构相关的静态特征和演化特征,并构建特征样本数据库;最后对比多种机器学习的方法对,选出效果最佳的分类器推荐重构克隆。在7款软件近170个版本上进行的实验表明,推荐重构克隆代码的准确度达到90%以上,这为软件开发和维护人员提供了更加准确、合理的代码重构建议。

关键词: 克隆代码, 克隆跟踪, 克隆家系, 克隆重构, 特征提取

Abstract: The research on the existing clone code reconstruction is limited to a single version of static analysis while ignoring the evolution process of the cloned code,resulting in a lack of effective methods for reconstructing the cloned code.Therefore,this paper firstly extracted the evolution history information closely related to the clone code from clone detection,clone mapping,clone family and software maintenance log management system.Secondly,the clone code that needs to be reconstructed was identified,and the traced clone code was identified at the same time.Then,static features and evolution features were extracted and reconstructed and a feature sample database was built.Finally,a variety of machine learning methods were used to compare and select the best classifier recommended reconstruction of clones.In this paper,experiments were performed on nearly 170 versions of 7 software.The results show that the readiness for reconstructing cloned code is more than 90%.It provides more accurate and reasonable code reconstruction suggestions for software development and maintenance personnel

Key words: Clone family, Clone refactoring, Clone tracking, Code clone, Feature extraction

中图分类号: 

  • TP311.5
[1]BALAZINSKA M,MERLO E,DAGENAIS M,et al.Advanced Clone-analysis to Support Object-oriented System Refactoring[C]∥Proceedings of the Seventh Working Conference on Reverse Engineering.IEEE press,2000:98-107.
[2]KIM M,SAZAWAL V,NOTKIN D,et al.An empirical study of code clone genealogies [J].AcmSigsoft Software Engineering Notes,2005,30(5):187-196.
[3]ROY C K,CORDY J R,KOSCHKE R.Comparison and evaluation of code clone detection techniques and tools:A qualitative approach[J].Science of Computer Programming,2009,74(7):470-495.
[4]ROY C K,CORDY J R.Near-miss function clones in open source software:an empirical study[J].Journal of Software Maintenance & Evolution Research & Practice,2010,22(3):165-189.
[5]BASIT H A,PUGLISII S J,SMYTH W F,et al.Efficient token based clone detection with flexible tokenization[C]∥TheJoint Meeting on European Software Engineering Conference and the ACM Sigsoft Symposium on the Foundations of Software Engineering:Companion Papers.ACM,2007:513-516.
[6]DUALA-EKOKO E,ROBILLARD M P.Clonetracker:tool support for code clone management[C]∥Proceedings of the 2008 International Conference on Software Engineering.New York:ACM,2008:843-846.
[7]APDAN M,AKTAS M,YIGITI M.On the Structural Code Clone Detection Problem:A Survey and Software Metric Based Approach[C]∥Computational Science and Its Applications(ICCSA 2014).Springer International Publishing,2014:492-507.
[8]CUOMO A,SANTONE A,VILLANO U.A novel approach based on formal methods for clone detection[C]∥Proceedings of the 2012 International Workshop on Software Clones.Pisca-taway,NJ:IEEE,2012:8-14.
[9]CALEFATO F,LANUBILE F,MALLARDO T.Function clone detection in web applications:a semiautomatedapproach[J].Journal of Web Engineering,2004,3(1):3-21.
[10]ZHANG J J,WANG C H,ZHANG L P,et al.Clone code detection based on Token edit distance [J].Journal of Computer Applications,2015(12):3536-3543.(in Chinese) 张久杰,王春晖,张丽萍,等.基于Token编辑距离检测克隆代码[J].计算机应用,2015(12):3536-3543.
[11]BARBOUR L,KHOMH F,ZOU Y.Late propagation in software clones[C]∥Proceedings of the 27th IEEE International Conference on Software Maintenance.Washington DC:IEEE Computer Society,2011:273-282.
[12]SAHA R K,ROY C K,SCHNEIDER K A.An automatic framework for extracting and classifying near-miss clone genealogies[C]∥IEEE International Conference on Software Maintenance.IEEE,2011:293-302.
[13]HOTTA K,HIGO Y,KUSUMOTO S.Clone Tracking based on Similarity of CRD[J].Technical Report of IeiceSs,2013:113-117.
[14]ZHANG R X,ZHANG L P,WANG CH,et al.Clonal group mapping method based on topic modeling technology[J].Computer Engineering and Design,2015(6):1524-1529.(in Chinese) 张瑞霞,张丽萍,王春晖,等.基于主题建模技术的克隆群映射方法[J].计算机工程与设计,2015(6):1524-1529.
[15]GÖDE N,KOSCHKE R.Incremental Clone Detection[C]∥European Conference on Software Maintenance & Reengineering.2009:219-228.
[16]GE G S,LIU D S,HOU M.Software multi-version clonal group mapping method based on LDA and DBSCAN[J].Journal of Computer Applications,2017,34(2):481-486.(in Chinese) 葛广帅,刘东升,侯敏.基于LDA和DBSCAN的软件多版本克隆群映射方法[J].计算机应用研究,2017,34(2):481-486.
[17]BARBOUR L,KHOMH F,ZOU Y.An empirical study of faults in late propagation clone genealogies[J].Journal of Software:Evolution and Process,2013,25(11):1139-1165.
[18]KIM M,SAZAWAL V,NOTKKIN D,et al.An Empirical Study of Code Clone Genealogies[C]∥Proceedings of the 2005 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering.New York:ACM,2005:187-196.
[19]SAHA R K,ROY C K,SCHNEIDER K A.An automatic framework for extracting and classifying near-miss clone genealogies[C]∥Proceedings of the 2011 IEEE International Conference on Software Maintenance.Piscataway,NJ:IEEE,2011:293-302.
[20]MENG C,SU X H,WANG T T,et al.A New Clone Group Mapping Algorithm for Extracting Clone Genealogy on Multi-version Software[C]∥International Conference on Instrumentation.2013:848-853.
[21]GE G S,LIU D S,ZHANG L P,et al.Evolutionary Trace Construction and Pattern Recognition of Clone Code Based on Graph Model[J].Computer Engineering,2017,43(5):47-54.(in Chinese) 葛广帅,刘东升,张丽萍,等.基于图模型的克隆代码演化痕迹构建及模式识别[J].计算机工程,2017,43(5):47-54.
[22]OPDYKE W F.Refactoring Object Frame Works [M].Illinois:University of Illinois at Urban-Champaign,1992:18-35.
[23]BIAN Y X.Research on Process Extraction Method of Reconfigurable Clone Code [D].Harbin:Harbin Institute of Techno-logy,2014.(in Chinese) 边奕心.可重构克隆代码的过程提取方法研究[D].哈尔滨:哈尔滨工业大学,2014.
[24]BAKOTA T.Tracking the Evolution of Code Clones[C]∥The 37th International Conference on Current Trends in Theory and Practice of Computer Science.Novy′Smokovec,Slovakia:Sprin-ger,2011:86-98.
[25]MONDAL M,ROY C K,SCHNEIDER K A.SPCP-Miner:A tool for mining code clones that are important for refactoring or tracking[C]∥IEEE,International Conference on Software Analysis,Evolution and Reengineering.IEEE,2015:484-488.
[26]HIGO Y,KUSUMOTO S,INOUE K.A metric-based approach to identifying refactoring opportunities for merging code clones in a Java software system[J].Journal of Software Maintenance and Evolution:Research and Practice,2008,20(6):435-461.
[27]LIU D R,LIU D S,ZHANG L P,et al.Prediction of cloned code quality based on Bayesian network[J].Computer Science,2017,44(4):165-168.(in Chinese) 刘冬瑞,刘东升,张丽萍,等.基于贝叶斯网络预测克隆代码质量[J].计算机科学,2017,44(4):165-168.
[28]SHE R R,ZHANG L P,HOU M,et al.Method for recommending clone reconstruction based on decision tree[J].Journal of Computer Applications,2018,38(7):213-219,245.(in Chinese) 折蓉蓉,张丽萍,侯敏,等.基于决策树推荐克隆重构的方法[J].计算机应用,2018,38(7):213-219,245.
[29]STEIDL D.Feature-based detection of bugs in clones[C]∥International Workshop on Software Clones.IEEE,2013:76-82.
[30]WANG H,ZHANG L P,YAN S,et al.Feature selection model in cloned code harmful prediction[J].Journal of Computer Applications,2017,37(4):1135-1142.(in Chinese) 王欢,张丽萍,闫盛,等.克隆代码有害性预测中的特征选择模型[J].计算机应用,2017,37(4):1135-1142.
[31]WANG W,GODFREY M W.Recommending Clones for Refactoring Using Design,Context,and History[C]∥IEEE International Conference on Software Maintenance and Evolution.IEEE Computer Society,2014:331-340.
[1] 张源, 康乐, 宫朝辉, 张志鸿.
基于Bi-LSTM的期货市场关联交易行为检测方法
Related Transaction Behavior Detection in Futures Market Based on Bi-LSTM
计算机科学, 2022, 49(7): 31-39. https://doi.org/10.11896/jsjkx.210400304
[2] 曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨.
基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨
Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism
计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[3] 程成, 降爱莲.
基于多路径特征提取的实时语义分割方法
Real-time Semantic Segmentation Method Based on Multi-path Feature Extraction
计算机科学, 2022, 49(7): 120-126. https://doi.org/10.11896/jsjkx.210500157
[4] 刘伟业, 鲁慧民, 李玉鹏, 马宁.
指静脉识别技术研究综述
Survey on Finger Vein Recognition Research
计算机科学, 2022, 49(6A): 1-11. https://doi.org/10.11896/jsjkx.210400056
[5] 高元浩, 罗晓清, 张战成.
基于特征分离的红外与可见光图像融合算法
Infrared and Visible Image Fusion Based on Feature Separation
计算机科学, 2022, 49(5): 58-63. https://doi.org/10.11896/jsjkx.210200148
[6] 左杰格, 柳晓鸣, 蔡兵.
基于图像分块与特征融合的户外图像天气识别
Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion
计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263
[7] 任首朋, 李劲, 王静茹, 岳昆.
基于集成回归决策树的lncRNA-疾病关联预测方法
Ensemble Regression Decision Trees-based lncRNA-disease Association Prediction
计算机科学, 2022, 49(2): 265-271. https://doi.org/10.11896/jsjkx.201100132
[8] 张师鹏, 李永忠.
基于降噪自编码器和三支决策的入侵检测方法
Intrusion Detection Method Based on Denoising Autoencoder and Three-way Decisions
计算机科学, 2021, 48(9): 345-351. https://doi.org/10.11896/jsjkx.200500059
[9] 冯霞, 胡志毅, 刘才华.
跨模态检索研究进展综述
Survey of Research Progress on Cross-modal Retrieval
计算机科学, 2021, 48(8): 13-23. https://doi.org/10.11896/jsjkx.200800165
[10] 张丽倩, 李孟航, 高珊珊, 张彩明.
面向计算机辅助舌诊关键问题的解决方案综述
Summary of Computer-assisted Tongue Diagnosis Solutions for Key Problems
计算机科学, 2021, 48(7): 256-269. https://doi.org/10.11896/jsjkx.200800223
[11] 暴雨轩, 芦天亮, 杜彦辉, 石达.
基于i_ResNet34模型和数据增强的深度伪造视频检测方法
Deepfake Videos Detection Method Based on i_ResNet34 Model and Data Augmentation
计算机科学, 2021, 48(7): 77-85. https://doi.org/10.11896/jsjkx.210300258
[12] 霍帅, 庞春江.
基于Transformer和多通道卷积神经网络的情感分析研究
Research on Sentiment Analysis Based on Transformer and Multi-channel Convolutional Neural Network
计算机科学, 2021, 48(6A): 349-356. https://doi.org/10.11896/jsjkx.200800004
[13] 李娜娜, 王勇, 周林, 邹春明, 田英杰, 郭乃网.
基于特征重要度二次筛选的DDoS攻击随机森林检测方法
DDoS Attack Random Forest Detection Method Based on Secondary Screening of Feature Importance
计算机科学, 2021, 48(6A): 464-467. https://doi.org/10.11896/jsjkx.200900101
[14] 雷剑梅, 曾令秋, 牟洁, 陈立东, 王淙, 柴勇.
基于整车EMC标准测试和机器学习的反向诊断方法
Reverse Diagnostic Method Based on Vehicle EMC Standard Test and Machine Learning
计算机科学, 2021, 48(6): 190-195. https://doi.org/10.11896/jsjkx.200700204
[15] 张久杰, 陈超, 聂宏轩, 夏玉芹, 张丽萍, 马占飞.
基于类粒度的克隆代码群稳定性实证研究
Empirical Study on Stability of Clone Code Sets Based on Class Granularity
计算机科学, 2021, 48(5): 75-85. https://doi.org/10.11896/jsjkx.200900062
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!