融合双向门控循环单元和注意力机制的软件自承认技术债识别方法

doi:10.11896/jsjkx.210500075

计算机科学 ›› 2022, Vol. 49 ›› Issue (7): 212-219.doi: 10.11896/jsjkx.210500075

融合双向门控循环单元和注意力机制的软件自承认技术债识别方法

熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚

江苏科技大学计算机学院江苏镇江212100

收稿日期:2021-05-12 修回日期:2021-09-06 出版日期:2022-07-15 发布日期:2022-07-12
通讯作者: 郑尚(szheng@just.edu.cn)
作者简介:(xlg935003328@sina.com)
基金资助:
江苏省高等学校自然科学研究面上基金(18JBK520011);江苏省镇江市重点研发计划(社会发展)项目(SH2019021);江苏省自然科学基金面上项目(BK20191457)

Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism

XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang

School of Computer,Jiangsu University of Science and Technology,Zhenjiang,Jiangsu 212100,China

Received:2021-05-12 Revised:2021-09-06 Online:2022-07-15 Published:2022-07-12
About author:XIONG Luo-geng,born in 1996,postgraduate.His main research interests include intelligent software engineering and so on.
ZHENG Shang,born in 1983,Ph.D,associate professor,master's supervisor,is a member of China Computer Federation.His main research interests include intelligent software engineering and data mining.
Supported by:
Natural Science Research Foundation for Higher Education of Jiangsu Province(18JBK520011),Primary Research and Development Plan(Social Development) of Zhenjiang(SH2019021) and Natural Science Foundation of Jiangsu Province(BK20191457).

摘要/Abstract

摘要： 软件自承认技术债(Self-admitted Technical Debt,SATD)由程序开发人员写入项目的源代码注释中,是开发人员为追求短期效益而刻意留下软件缺陷的说明,大量的SATD将不利于软件维护。近年来,越来越多的学者致力于软件SATD识别的研究,并提出了不同的识别方法,如基于自然语言处理或文本挖掘等检测方法。然而,大多数研究结果依赖于现有的词库或手工提取的特征,不仅耗费了大量的时间,而且增加了计算复杂度,识别结果并不理想。基于此,提出了一种基于双向门控循环单元(Gate Recurrent Unit,GRU)和注意力机制的软件自承认技术债识别方法,通过Word2vec中的Skip-gram模型获取词向量,构建双向GRU网络获取高级特征,并利用注意力机制自动发现对SATD分类起到关键作用的词,从而捕获最重要的语义信息。实验结果表明,本文方法在精确率、召回率和F1-score上均有较优表现,能够有效地识别软件SATD,避免了传统任务中复杂的特征工程。

关键词: GRU, Word2vec, 软件维护, 注意力机制, 自承认技术债

Abstract: Software self-admitted technical debt(SATD) is written into the source code comments of the project by developers who leave a note admitting incurring intentionally for short-term benefits,and a large amount of SATD will be dangerous to software maintenance.In recent years,more scholars focus on the research of software SATD recognition and propose different identification approaches,such as SATD detection based on natural language processing or text mining.However,the identification results of most previous studies are not very well due to the existing thesaurus or manually extracted features,which not only consumes a lot of time,but also increases computational complexity.Therefore,a software SATD identification approach based on bidirectional gated recurrent unit(GRU) and attention mechanism is proposed.The word vector is obtained first through the Skip-gram model,and the bidirectional GRU network is constructed to obtain the high-level features.Finally,the attention mechanism is used to automatically discover words that play a key role in SATD identification,and the most important semantic information can be captured.Experimental results show that the proposed approach has excellent performance in precision,recall and F1-score.It can effectively identify software SATD and avoid complex feature engineering in traditional tasks.

Key words: Attention mechanism, GRU, SATD, Software maintenance, Word2vec

中图分类号:

TP311

熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法[J]. 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075

XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism[J]. Computer Science, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075

参考文献

[1]GABRIELE B,BARBARA R.A large-scale empirical study on self-admitted technical debt[C]//Proceedings of the 13th International Workshop.IEEE,2016:315-326.
[2]CUNNINGHAM W.The WyCash portfolio management system[J].Acm Sigplan Oops Messenger,1992,4(2):29-30.
[3]HUANG C,XU K H,ZHENG S,et al.Software self-admitted technical debt identification approach based on cross oversampling[J].Journal of Jiangsu University of Science and Techno-logy Natural Science Edition,2020,182(5):55-60.
[4]POTDAR A,SHIHAB E.An Exploratory Study on Self-Admitted Technical Debt[C]//2014 IEEE International Conference on Software Maintenance and Evolution.IEEE,2014:91-100.
[5]JERNEJ F,VILI P.Enhanced Feature Selection Using WordEmbeddings for Self-Admitted Technical Debt Identification[C]//Proceedings of the 2018 44th Euromicro Conference on Software Engineering and Advanced Applications(SEAA).IEEE Computer Society,2018:230-233.
[6]SIERRA G,SHIHAB E,KAMEI Y.A survey of self-admitted technical debt[J].Journal of Systems and Software,2019,152:70-82.
[7]ZAMPETTI F,SEREBRENIK A,PENTA M.Was Self-Admitted Technical Debt Removal a Real Removal?An In-Depth Perspective[C]//IEEE/ACM International Conference on Mining Software Repositories.IEEE Computer Society,2018:526-536.
[8]AVERSANO L,IAMMARINO M,CARAPELLA M,et al.On the Relationship between Self-Admitted Technical Debt Remo-vals and Technical Debt Measures[J].Algorithms,2020,13(7):1-16.
[9]HUANG Q,SHIHAB E,XIA X,et al.Identifying self-admitted technical debt in open source projects using text mining[J].Empirical Software Engineering,2018,23(1):418-451.
[10]MALDONADO E D S,SHIHAB E,TSANTALIS N.UsingNatural Language Processing to Automatically Detect Self-Admitted Technical Debt[J].IEEE Transactions on Software Engineering,2017,43(11):1044-1062.
[11]MALDONADO E D S,SHIHAB E.Detecting and quantifyingdifferent types of self-admitted technical Debt[C]//IEEE International Workshop on Managing Technical Debt.IEEE Compu-ter Society,2015:9-15.
[12]WEHAIBI S,SHIHAB E,GUERROUJ L.Examining the Impact of Self-Admitted Technical Debt on Software Quality[C]//Proceedings of the 2016 IEEE 23rd International Conference on Software Analysis,Evolution,and Reengineering(SANER).IEEE,2016:179-188.
[13]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[14]HOCHREITER S,SCHMIDHUBER J.Long Short-Term Me-mory[J].Neural Computation,1997,9(8):1735-1780.
[15]BI L,HU G,RAZA M M,et al.A Gated Recurrent Units(GRU)-Based Model for Early Detection of Soybean Sudden Death Syndrome through Time-Series Satellite Imagery[J].Remote Sensing,2020,12(21):1-20.
[16]MIAO J,DUAN Y X,ZHANG Y Q,et al.Method for Extracting Event Trigger Words Based on the CNN-BiGRU Model[J].Computer Engineering,2021,47(9):69-74,83.
[17]CHEN J J,PENG B Z,WU P Z.Malicious Code DetectionMethod Based on Dynamic Behavior and Machine Learning[J].Computer Engineering,2021,47(3):166-173.
[18]SCHUSTER M,PALIWAL K K.Bidirectional recurrent neural networks[J].IEEE Transactions on Signal Processing,1997,45(11):2673-2681.
[19]PENG Z,WEI S,TIAN J,et al.Attention-Based BidirectionalLong Short-Term Memory Networks for Relation Classification[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics,2016:207-212.
[20]WANG H,SHI J C,ZHANG Z W.Text semantic relation extraction of LSTM based on attention mechanism[J].Application Research of Computers,2018,319(5):143-146,166.
[21]REN X X,XING Z C,XIA X,et al.Neural Network-based Detection of Self-Admitted Technical Debt:From Performance to Explainability[J].ACM Transactions on Software Engineering and Methodology,2019,28(3):1-45.
[22]MAIPRADIT R,TREUDE C,HATA H,et al.Wait for it:identifying “On-Hold” self-admitted technical debt[J].Empirical Software Engineering,2020,25(5):3770-3798.
[23]XIAO L,CAI Y,KAZMAN R,et al.Identifying and quantifying architectural debt[C]//IEEE/ACM 38th IEEE International Conference on Software Engineering.2016:488-498.
[24]KIRK B S,PETERSON J W,STOGNER R H,et al.libMesh:a C++ library for parallel adaptive mesh refinement/coarsening simulations[J].Engineering with Computers,2006,22(3/4):237-254.

相关文章 15

[1]	饶志双, 贾真, 张凡, 李天瑞. 基于Key-Value关联记忆网络的知识图谱问答方法 Key-Value Relational Memory Networks for Question Answering over Knowledge Graph 计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[2]	周芳泉, 成卫青. 基于全局增强图神经网络的序列推荐 Sequence Recommendation Based on Global Enhanced Graph Neural Network 计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[3]	戴禹, 许林峰. 基于文本行匹配的跨图文本阅读方法 Cross-image Text Reading Method Based on Text Line Matching 计算机科学, 2022, 49(9): 139-145. https://doi.org/10.11896/jsjkx.220600032
[4]	周乐员, 张剑华, 袁甜甜, 陈胜勇. 多层注意力机制融合的序列到序列中国连续手语识别和翻译 Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion 计算机科学, 2022, 49(9): 155-161. https://doi.org/10.11896/jsjkx.210800026
[5]	熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112
[6]	姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[7]	朱承璋, 黄嘉儿, 肖亚龙, 王晗, 邹北骥. 基于注意力机制的医学影像深度哈希检索算法 Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism 计算机科学, 2022, 49(8): 113-119. https://doi.org/10.11896/jsjkx.210700153
[8]	孙奇, 吉根林, 张杰. 基于非局部注意力生成对抗网络的视频异常事件检测方法 Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection 计算机科学, 2022, 49(8): 172-177. https://doi.org/10.11896/jsjkx.210600061
[9]	闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[10]	汪鸣, 彭舰, 黄飞虎. 基于多时间尺度时空图网络的交通流量预测模型 Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction 计算机科学, 2022, 49(8): 40-48. https://doi.org/10.11896/jsjkx.220100188
[11]	金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[12]	彭双, 伍江江, 陈浩, 杜春, 李军. 基于注意力神经网络的对地观测卫星星上自主任务规划方法 Satellite Onboard Observation Task Planning Based on Attention Neural Network 计算机科学, 2022, 49(7): 242-247. https://doi.org/10.11896/jsjkx.210500093
[13]	张颖涛, 张杰, 张睿, 张文强. 全局信息引导的真实图像风格迁移 Photorealistic Style Transfer Guided by Global Information 计算机科学, 2022, 49(7): 100-105. https://doi.org/10.11896/jsjkx.210600036
[14]	曾志贤, 曹建军, 翁年凤, 蒋国权, 徐滨. 基于注意力机制的细粒度语义关联视频-文本跨模态实体分辨 Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism 计算机科学, 2022, 49(7): 106-112. https://doi.org/10.11896/jsjkx.210500224
[15]	徐鸣珂, 张帆. Head Fusion:一种提高语音情绪识别的准确性和鲁棒性的方法 Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition 计算机科学, 2022, 49(7): 132-141. https://doi.org/10.11896/jsjkx.210100085

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

融合双向门控循环单元和注意力机制的软件自承认技术债识别方法

Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

Metrics

本文评价

推荐阅读 0