Computer Science ›› 2020, Vol. 47 ›› Issue (9): 10-16.doi: 10.11896/jsjkx.200400041

• Computer Software • Previous Articles     Next Articles

Cross-project Clone Consistency Prediction via Transfer Learning and Oversampling Technology

OUYANG Peng1, LU Lu1,2, ZHANG Fan-long3, QIU Shao-jian4   

  1. 1 School of Computer Science and Engineering,South China University of Technology,Guangzhou 510641,China
    2 Technology Research Institute,South China University of Technology,Meizhou,Guangdong 514021,China
    3 School of Computers,Guangdong University of Technology,Guangzhou 510006,China
    4 School of Mathematics and Informatics,South China Agricultural University,Guangzhou 510642,China
  • Received:2020-04-09 Published:2020-09-10
  • About author:OUYANG Peng,born in 1996,postgraduate.His main research interests include software reliability maintenance and transfer learning.
    LU Lu,born in 1971,Ph.D,professor,is a member of China Computer Federation.His main research interests include software engineering,software testing and software architecture design.
  • Supported by:
    National Natural Science Foundation of China (61370103),Industry-University-Research Foundation of Guangzhou (201902020004) and Industry-University-Research Project of Meizhou (2019A0101019).

Abstract: In recent years,as software requirements increase,developers have introduced a large amount of clone code into the project by reusing existing code.As the software version is updated,the clone code changes and it may become a burden on software maintenance.Researchers have attempted to use the machine learning to conduct research on the prediction of clone code consistency,and help the software quality assurance team to allocate maintenance resources more effectively by predicting whether changes to cloned code will cause additional maintenance costs,thereby improving work efficiency and reducing maintenance costs.However,in the early stage of software development,software projects are often not fully evolved,and historical data is lacking for constructing an effective predictive model.Therefore,cross-project clone code consistency prediction methods are proposed.In this paper,we propose a cross-project clone code consistency prediction method via transfer learning and oversampling technology (CPCCP+).This method aims to match test set and training set into kernel space,reduce the distribution discrepancy of cross-project data by transfer component analysis,and alleviate the class imbalance issue to improve the performance of cross-project prediction model.In terms of experimental datasets,this paper selects seven open source datasets,which can form 42 combinations of cross-project clone code consistency prediction tasks totally.In terms of model performance comparison,the CPCCP+ proposed in this paper is compared with the method only using base classifier.The evaluation metrics include precision,recall and F-measure.The experimental results show that CPCCP+ can more effectively perform cross-project clone code consistency prediction.

Key words: Code clone, Consistent change, Cross-project prediction, Oversampling technology, Transfer learning

CLC Number: 

  • TP311
[1] SAJNANI H,SAINI V,SVAJLENKO J,et al.SourcererCC:Scaling code clone detection to big-code[C]//2016 IEEE/ACM 38th International Conference on Software Engineering.2016:1157-1168.
[2] KRINKE J.A study of consistent and inconsistent changes tocode clones[C]//14th working Conference on Reverse Engineering.2007:170-178.
[3] BETTENBURG N,SHANG W,IBRAHIM W M,et al.An empirical study on inconsistent changes to code clones at the release level[J].Science of Computer Programming,2012,77(6):760-776.
[4] WAGNER S,ABDULKHALEQ A,KAYA K,et al.On the rela-tionship of inconsistent software clones and faults:an empirical study[C]//2016 IEEE 23rd International Conference on Software Analysis,Evolution,and Reengineering.2016:79-89.
[5] JUERGENS E,DEISSENBOECK F,HUMMEL B,et al.Docode clones matter?[C] //Proceedings of the 31st InternationalConference on Software Engineering.2009:485-495.
[6] WHITE M,TUFANO M,VENDOME C,et al.Deep learning code fragments for code clone detection[C]//Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering.2016:87-98.
[7] ZHANG F,KHOO S C,SU X.Predicting consistent clonechange[C]//2016 IEEE 27th International Symposium on Software Reliability Engineering.2016:353-364.
[8] ZHANG F,KHOO S C,SU X.Machine-Learning Aided Analysis of Clone Evolution[J].Chinese Journal of Electronics,2017,26(6):1132-1138.
[9] KIM M,SAZAWAL V,NOTKIN D,et al.An empirical study of code clone genealogies [J].Acmfigsoft Software Engineering Notes,2005,30(5):187-196.
[10] ZHANG F.Research on analysis and consistency maintenance of code clone based on software evolution[D].Harbin:Harbin Institute of Technology,2017.
[11] KAMEIY,MONDEN A,MATSUMOTO S,et al.The effects of over and under sampling on fault- prone module detection[C]//Proceedings of the First International Symposium on Empirical Software Engineering and Measurement.IEEE,2007:196-204.
[12] SEIFFERT C,KHOSHGOFTAAR T M,VAN HULSE J.Improving software-quality predictions with data sampling and boosting[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2009,39(6):1283-1294.
[13] PAN S J,YANG Q.A survey on transfer learning[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(10):1345-1359.
[14] BORGWARDT KM,GRETTON A,RASCG M J,et al.In-tegrating structured biological data by kernel maximum mean discrepancy[J].Bioinformatics,2006,22(14):e49-e57.
[15] ZHANG F,KHOO S,SU X.Predicting change consistency in a clone group[J].Journal of Systems and Software,2017,134:105-119.
[16] PAN S J,TSANG I W,KWOK J T,et al.Domain adaptation via transfer component analysis[J].IEEE Transactions on Neural Networks,2011,22(2):199-210.
[17] ROY C K,CORDY J R.NICAD:Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization[C]//Proceedings of IEEE International Conference on Program Comprehension.2008:172-181.
[18] HALSTEAD M H.Elements of software science[M].NewYork:Elsevier,1977.
[19] SHE R,ZHANG L.Method for Identifying and Recommending Reconstructed Clones Based on Software Evolution History[J].Computer Science,2019,46(8):224-232.
[20] KHOSHGOFTAAR T M,SEIFFERT C,VAN HULSE J,et al.Learning with limited minority class data[C]//Proceedings of the International Conference on Machine Learning and Applications.IEEE,2007:348-353.
[21] SU X,ZHANG F.A Survey for Management-Oriented CodeClone Research[J].Chinese Journal of Computers,2018,41(3):628-651.
[22] HHAN J,PEI J,KAMBER M.Data mining:concepts and techniques[M].NewYork:Elsevier,2011.
[1] FANG Yi-qiu, ZHANG Zhen-kun, GE Jun-wei. Cross-domain Recommendation Algorithm Based on Self-attention Mechanism and Transfer Learning [J]. Computer Science, 2022, 49(8): 70-77.
[2] WANG Jun-feng, LIU Fan, YANG Sai, LYU Tan-yue, CHEN Zhi-yu, XU Feng. Dam Crack Detection Based on Multi-source Transfer Learning [J]. Computer Science, 2022, 49(6A): 319-324.
[3] PENG Yun-cong, QIN Xiao-lin, ZHANG Li-ge, GU Yong-xiang. Survey on Few-shot Learning Algorithms for Image Classification [J]. Computer Science, 2022, 49(5): 1-9.
[4] TAN Zhen-qiong, JIANG Wen-Jun, YUM Yen-na-cherry, ZHANG Ji, YUM Peter-tak-shing, LI Xiao-hong. Personalized Learning Task Assignment Based on Bipartite Graph [J]. Computer Science, 2022, 49(4): 269-281.
[5] ZUO Jie-ge, LIU Xiao-ming, CAI Bing. Outdoor Image Weather Recognition Based on Image Blocks and Feature Fusion [J]. Computer Science, 2022, 49(3): 197-203.
[6] ZHANG Shu-meng, YU Zeng, LI Tian-rui. Transferable Emotion Analysis Method for Cross-domain Text [J]. Computer Science, 2022, 49(3): 218-224.
[7] XIONG Zhao-yang, WANG Ting. Image Recognition for Building Components Based on Convolutional Neural Network [J]. Computer Science, 2021, 48(6A): 51-56.
[8] WU Lan, WANG Han, LI Bin-quan. Unsupervised Domain Adaptive Method Based on Optimal Selection of Self-supervised Tasks [J]. Computer Science, 2021, 48(6A): 357-363.
[9] LI Da, LEI Ying-ke, ZHANG Hai-chuan. Outdoor Fingerprint Positioning Based on LTE Networks [J]. Computer Science, 2021, 48(6A): 404-409.
[10] LIU Yu-tong, LI Peng, SUN Yun-yun, HU Su-jun. Image Recognition with Deep Dynamic Joint Adaptation Networks [J]. Computer Science, 2021, 48(6): 131-137.
[11] ZHOU Yan, CHEN Shao-chang, WU Ke, NING Ming-qiang, CHEN Hong-kun, ZHANG Peng. SCTD 1.0:Sonar Common Target Detection Dataset [J]. Computer Science, 2021, 48(11A): 334-339.
[12] LIU Lin-ya, WU Song-ying, ZUO Zhi-yuan, CAO Zi-wen. Research on Rockfall Detection Method of Mountain Railway Slope Based on YOLOv3 Algorithm [J]. Computer Science, 2021, 48(11A): 290-294.
[13] LE Qiao-yi, LIU Jian-xun, SUN Xiao-ping, ZHANG Xiang-ping. Survey of Research Progress of Code Clone Detection [J]. Computer Science, 2021, 48(11A): 509-522.
[14] WANG Xin-ping, XIA Chun-ming, YAN Jian-jun. Sign Language Recognition Based on Image-interpreted Mechanomyography and Convolution Neural Network [J]. Computer Science, 2021, 48(11): 242-249.
[15] YU Jie, JI Bin, LIU Lei, LI Sha-sha, MA Jun, LIU Hui-jun. Joint Extraction Method for Chinese Medical Events [J]. Computer Science, 2021, 48(11): 287-293.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!