计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 10-16.doi: 10.11896/jsjkx.200400041
欧阳鹏1, 陆璐1,2, 张凡龙3, 邱少健4
OUYANG Peng1, LU Lu1,2, ZHANG Fan-long3, QIU Shao-jian4
摘要: 近年来,随着软件需求的不断增加,开发人员通过复用已有的代码向项目中引入了大量的克隆代码。随着软件版本的迭代和更新,克隆代码会发生变化,而克隆代码变化会导致额外的维护代价,并逐渐成为软件维护的负担。研究人员尝试利用机器学习方法开展克隆代码一致性维护需求预测研究,通过预测克隆代码的变化是否会导致额外的维护代价,来帮助软件质量保障团队更有效地分配维护资源,从而提高工作效率并降低运维成本。然而,在软件开发的初期阶段,软件项目往往没有经过充分的演化,缺少历史数据用于构建有效的预测模型,因此跨项目克隆代码一致性维护需求预测方法被提出。文中以减少跨项目数据分布差异为切入点,提出了基于迁移学习和过采样技术的跨项目克隆代码一致性维护需求预测方法CPCCP+,旨在将测试集与数据集映射到核空间中,通过迁移主成分分析方法减小跨项目数据的分布差异,并对数据集的类不平衡问题进行处理,从而提高跨项目预测模型的性能。在实验数据集方面,选取了7个开源数据集,合计形成42组跨项目克隆代码一致性维护需求预测任务。将提出的方法与使用基分类器的方法进行比较,评估指标包含Precision,Recall和F-Measure。实验结果表明,CPCCP+能更有效地进行跨项目克隆代码一致性维护需求的预测。
中图分类号:
[1] SAJNANI H,SAINI V,SVAJLENKO J,et al.SourcererCC:Scaling code clone detection to big-code[C]//2016 IEEE/ACM 38th International Conference on Software Engineering.2016:1157-1168. [2] KRINKE J.A study of consistent and inconsistent changes tocode clones[C]//14th working Conference on Reverse Engineering.2007:170-178. [3] BETTENBURG N,SHANG W,IBRAHIM W M,et al.An empirical study on inconsistent changes to code clones at the release level[J].Science of Computer Programming,2012,77(6):760-776. [4] WAGNER S,ABDULKHALEQ A,KAYA K,et al.On the rela-tionship of inconsistent software clones and faults:an empirical study[C]//2016 IEEE 23rd International Conference on Software Analysis,Evolution,and Reengineering.2016:79-89. [5] JUERGENS E,DEISSENBOECK F,HUMMEL B,et al.Docode clones matter?[C] //Proceedings of the 31st InternationalConference on Software Engineering.2009:485-495. [6] WHITE M,TUFANO M,VENDOME C,et al.Deep learning code fragments for code clone detection[C]//Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.2016:87-98. [7] ZHANG F,KHOO S C,SU X.Predicting consistent clonechange[C]//2016 IEEE 27th International Symposium on Software Reliability Engineering.2016:353-364. [8] ZHANG F,KHOO S C,SU X.Machine-Learning Aided Analysis of Clone Evolution[J].Chinese Journal of Electronics,2017,26(6):1132-1138. [9] KIM M,SAZAWAL V,NOTKIN D,et al.An empirical study of code clone genealogies [J].Acmfigsoft Software Engineering Notes,2005,30(5):187-196. [10] ZHANG F.Research on analysis and consistency maintenance of code clone based on software evolution[D].Harbin:Harbin Institute of Technology,2017. [11] KAMEIY,MONDEN A,MATSUMOTO S,et al.The effects of over and under sampling on fault- prone module detection[C]//Proceedings of the First International Symposium on Empirical Software Engineering and Measurement.IEEE,2007:196-204. [12] SEIFFERT C,KHOSHGOFTAAR T M,VAN HULSE J.Improving software-quality predictions with data sampling and boosting[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2009,39(6):1283-1294. [13] PAN S J,YANG Q.A survey on transfer learning[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1345-1359. [14] BORGWARDT KM,GRETTON A,RASCG M J,et al.In-tegrating structured biological data by kernel maximum mean discrepancy[J].Bioinformatics,2006,22(14):e49-e57. [15] ZHANG F,KHOO S,SU X.Predicting change consistency in a clone group[J].Journal of Systems and Software,2017,134:105-119. [16] PAN S J,TSANG I W,KWOK J T,et al.Domain adaptation via transfer component analysis[J].IEEE Transactions on Neural Networks,2011,22(2):199-210. [17] ROY C K,CORDY J R.NICAD:Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization[C]//Proceedings of IEEE International Conference on Program Comprehension.2008:172-181. [18] HALSTEAD M H.Elements of software science[M].NewYork:Elsevier,1977. [19] SHE R,ZHANG L.Method for Identifying and Recommending Reconstructed Clones Based on Software Evolution History[J].Computer Science,2019,46(8):224-232. [20] KHOSHGOFTAAR T M,SEIFFERT C,VAN HULSE J,et al.Learning with limited minority class data[C]//Proceedings of the International Conference on Machine Learning and Applications.IEEE,2007:348-353. [21] SU X,ZHANG F.A Survey for Management-Oriented CodeClone Research[J].Chinese Journal of Computers,2018,41(3):628-651. [22] HHAN J,PEI J,KAMBER M.Data mining:concepts and techniques[M].NewYork:Elsevier,2011. |
[1] | 方义秋, 张震坤, 葛君伟. 基于自注意力机制和迁移学习的跨领域推荐算法 Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning 计算机科学, 2022, 49(8): 70-77. https://doi.org/10.11896/jsjkx.210600011 |
[2] | 王君锋, 刘凡, 杨赛, 吕坦悦, 陈峙宇, 许峰. 基于多源迁移学习的大坝裂缝检测 Dam Crack Detection Based on Multi-source Transfer Learning 计算机科学, 2022, 49(6A): 319-324. https://doi.org/10.11896/jsjkx.210500124 |
[3] | 彭云聪, 秦小林, 张力戈, 顾勇翔. 面向图像分类的小样本学习算法综述 Survey on Few-shot Learning Algorithms for Image Classification 计算机科学, 2022, 49(5): 1-9. https://doi.org/10.11896/jsjkx.210500128 |
[4] | 谭珍琼, 姜文君, 任演纳, 张吉, 任德盛, 李晓鸿. 基于二分图的个性化学习任务分配 Personalized Learning Task Assignment Based on Bipartite Graph 计算机科学, 2022, 49(4): 269-281. https://doi.org/10.11896/jsjkx.210500125 |
[5] | 左杰格, 柳晓鸣, 蔡兵. 基于图像分块与特征融合的户外图像天气识别 Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion 计算机科学, 2022, 49(3): 197-203. https://doi.org/10.11896/jsjkx.201200263 |
[6] | 张舒萌, 余增, 李天瑞. 跨领域文本的可迁移情绪分析方法 Transferable Emotion Analysis Method for Cross-domain Text 计算机科学, 2022, 49(3): 218-224. https://doi.org/10.11896/jsjkx.210400034 |
[7] | 李星燃, 张立言, 姚树婧. 结合特征融合和注意力机制的微表情识别方法 Micro-expression Recognition Method Combining Feature Fusion and Attention Mechanism 计算机科学, 2022, 49(2): 4-11. https://doi.org/10.11896/jsjkx.210900028 |
[8] | 侯宏旭, 孙硕, 乌尼尔. 蒙汉神经机器翻译研究综述 Survey of Mongolian-Chinese Neural Machine Translation 计算机科学, 2022, 49(1): 31-40. https://doi.org/10.11896/jsjkx.210900006 |
[9] | 熊朝阳, 王婷. 基于卷积神经网络的建筑构件图像识别 Image Recognition for Building Components Based on Convolutional Neural Network 计算机科学, 2021, 48(6A): 51-56. https://doi.org/10.11896/jsjkx.200500122 |
[10] | 吴兰, 王涵, 李斌全. 基于自监督任务最优选择的无监督域自适应方法 Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks 计算机科学, 2021, 48(6A): 357-363. https://doi.org/10.11896/jsjkx.201000030 |
[11] | 李达, 雷迎科, 张海川. 基于LTE网络的室外指纹定位 Outdoor Fingerprint Positioning Based on LTE Networks 计算机科学, 2021, 48(6A): 404-409. https://doi.org/10.11896/jsjkx.200700170 |
[12] | 刘昱彤, 李鹏, 孙云云, 胡素君. 基于深度动态联合自适应网络的图像识别方法 Image Recognition with Deep Dynamic Joint Adaptation Networks 计算机科学, 2021, 48(6): 131-137. https://doi.org/10.11896/jsjkx.210100008 |
[13] | 张久杰, 陈超, 聂宏轩, 夏玉芹, 张丽萍, 马占飞. 基于类粒度的克隆代码群稳定性实证研究 Empirical Study on Stability of Clone Code Sets Based on Class Granularity 计算机科学, 2021, 48(5): 75-85. https://doi.org/10.11896/jsjkx.200900062 |
[14] | 刘林芽, 吴送英, 左志远, 曹子文. 基于YOLOv3算法的山区铁路边坡落石检测方法研究 Research on Rockfall Detection Method of Mountain Railway Slope Based on YOLOv3 Algorithm 计算机科学, 2021, 48(11A): 290-294. https://doi.org/10.11896/jsjkx.201200113 |
[15] | 周彦, 陈少昌, 吴可, 宁明强, 陈宏昆, 张鹏. SCTD1.0:声呐常见目标检测数据集 SCTD 1.0:Sonar Common Target Detection Dataset 计算机科学, 2021, 48(11A): 334-339. https://doi.org/10.11896/jsjkx.210100138 |
|