计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 250200086-11.doi: 10.11896/jsjkx.250200086
施恩译1, 常舒予2, 陈可佳2,3, 张扬2, 黄海平2,4
SHI Enyi1, CHANG Shuyu2, CHEN Kejia2,3, ZHANG Yang2, HUANG Haiping2,4
摘要: 在现代复杂软件项目中,软件错误与代码呈现“多对多”的对应关系,一个软件错误往往由多个代码变更集引起,一个代码变更集也会引起多个软件错误。因此,对于软件错误往往只能实现部分定位,难以追溯全部的相关代码。传统架构对于代码变更集或软件错误语义特征的提取,往往只分别独立地依赖各自的上下文。现代软件项目规模庞大,代码依赖错综复杂、这样分别独立的语义提取方式,降低了单个文本语义特征的质量与鲁棒性,导致最终的定位性能下滑。为实现对软件错误相关代码的全面追溯,提出了BiGCN-TL模型。BiGCN-TL重点聚焦训练模型促进不同文本之间信息交互的能力,旨在降低对单个文本语义特征质量的依赖,使得在现代软件项目规模庞大、代码依赖错综复杂、单个文本语义特征提取困难的场景下,仍能通过高效的信息交互,提取到高质量语义特征,提高定位准确率。首先根据已知的部分定位关系,微调基于Transformer的预训练模型。然后,创新性地将软件错误和代码变更集建模成二分图的数据结构,借此充分利用已知的“多对多”关系,并使用微调后的编码器得到节点特征的初始表示。之后,基于二分图设计链接预测任务,训练GCN与二分类鉴别器。借助图卷积操作和注意力机制动态更新节点特征,重点训练模型促进文本信息的交互,动态更新节点特征的能力,从而得到高质量全局分类特征,最终输出匹配预测得分。在多个数据集上开展了对比实验,结果验证了BiGCN-TL相比传统方案的优越性,并通过消融实验确认了各模块的有效性。此外,通过探索多种预训练模型与GCN的组合,并结合具体案例和可视化分析,进一步验证了BiGCN-TL的通用性与鲁棒性。
中图分类号:
[1]CHOWDHURY S,UDDIN G,HEMMATI H,et al.Method-level bug prediction:Problems and promises [J].ACM Transactions on Software Engineering and Methodology,2024,33(4):1-31. [2]WU J,ZHANG Z,YANG D,et al.Time-Aware Spectrum-Based Bug Localization for Hardware Design Code with Data Purification [J].ACM Transactions on Architecture and Code Optimization,2024,21(3):1-25. [3]MAHMUD J,DE SILVA N,KHAN S A,et al.On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization[C]//Proceedings of the 46th IEEE/ACM International Conference on Software Engineering.2024:1-13. [4]MA Y F,DU Y,LI M.Capturing the Long-Distance Dependency in the Control Flow Graph via Structural-Guided Attention for Bug Localization[C]//IJCAI.2023:2242-2250. [5]DU Y,YU Z.Pre-training code representation with semanticflow graph for effective bug localization[C]//Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2023:579-591. [6]CIBOROWSKA A,DAMEVSKI K.Fast changeset-based buglocalization with BERT[C]//Proceedings of the 44th International Conference on Software Engineering.2022:946-957. [7]YANG A Z,LE GOUES C,MARTINS R,et al.Large language models for test-free fault localization[C]//Proceedings of the 46th IEEE/ACM International Conference on Software Engineering.2024:1-12. [8]BO L,JI W,SUN X,et al.ChatBR:Automated assessment and improvement of bug report quality using ChatGPT[C]//Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering.2024:1472-1483. [9]HOU X,ZHAO Y,LIU Y,et al.Large language models forsoftware engineering:A systematic literature review [J].ACM Transactions on Software Engineering and Methodology,2024,33(8):1-79. [10]MA Y F,LI M.Learning from the multi-level abstraction of the control flow graph via alternating propagation for bug localization[C]//2022 IEEE International Conference on Data Mining(ICDM).IEEE,2022:299-308. [11]LIN J,LIU Y,ZENG Q,et al.Traceability transformed:Generating more accurate links with pre-trained bert models[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering(ICSE).IEEE,2021:324-335. [12]WU X,JIANG L,WANG P S,et al.Point Transformer V3:Simpler Faster Stronger[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:4840-4851. [13]FRIEDMAN D,WETTIG A,CHEN D.Learning transformerprograms [J].Advances in Neural Information Processing Systems,2024,36:49044-49067. [14]LIU F,CHENG Z,ZHU L,et al.Interest-aware message-passing GCN for recommendation[C]//Proceedings of the web conference 2021.2021:1296-1305. [15]NIE F,HAO Z,WANG R.Multi-class support vector machine with maximizing minimum margin[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:14466-14473. [16]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports[C]//2012 34th International Conference on Software engineering(ICSE).IEEE,2012:14-24. [17]WANG J B,LUO J R,ZHOU Y Z,et al.Survey on Event Extraction Methods:Comparative Analysis of Deep Learning and Pre-training [J].Computer Science,2024,51(9):196-206. [18]GU Y,TINN R,CHENG H,et al.Domain-specific languagemodel pretraining for biomedical natural language processing [J].ACM Transactions on Computing for Healthcare(HEALTH),2021,3(1):1-23. [19]SUN K L,LUO X D,LUO Y R.Survey of Applications of Pretrained Language Models [J].Computer Science,2023,50(1):176-184. [20]LU Y,JIANG X,FANG Y,et al.Learning to pre-train graph neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:4276-4284. [21]MA X,GUO J,ZHANG R,et al.Pre-train a discriminative text encoder for dense retrieval via contrastive span prediction[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.2022:848-858. [22]ZENG Z F,HU X C,CHENG Q,et al.Survey of Research on Knowledge Graph Based on Pre-trained Language Models [J] Computer Science,2025,52(1):1-33. [23]LIU Y.Roberta:A robustly optimized bert pretraining approach [J].arXiv preprint arXiv:190711692,2019,364. [24]MA S,LIU J W,ZUO X.Survey on Graph Neural Network [J].Journal of Computer Research and Development,2022,59(01):47-80. [25]LIU J,SHANG X Q,SONG L Y,et al.Progress of Graph Neural Networks on Complex Graph Mining [J].Journal of Software,2022,33(10):3582-3618. [26]PAN X,GE C,LU R,et al.On the integration of self-attention and convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:815-825. [27]ZHU Z,LI Y,TONG H,et al.Cooba:Cross-project bug localiza-tion via adversarial transfer learning[C]//IJCAI.2020:3565-3571. [28]ZHU Z,TONG H,WANG Y,et al.BL-GAN:Semi-supervised bug localization via generative adversarial network [J].IEEE Transactions on Knowledge and Data Engineering,2022,35(11):11112-11125. [29]TANG Z,SHEN X,LI C,et al.Ast-trans:Code summarization with efficient tree-structured attention[C]//Proceedings of the 44th International Conference on Software Engineering.2022:150-162. [30]XIA W,GAO Q,WANG Q,et al.Tensorized bipartite graph learning for multi-view clustering [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):5187-5202. [31]PAN C,LU M,XU B.An empirical study on software defectprediction using codebert model [J].Applied Sciences,2021,11(11):4793. [32]BHATTI U A,TANG H,WU G,et al.Deep learning withgraph convolutional networks:An overview and latest applications in computational intelligence [J].International Journal of Intelligent Systems,2023,2023(1):8342104. [33]VRAHATIS A G,LAZAROS K,KOTSIANTIS S.Graph attention networks:a comprehensive review of methods and applications [J].Future Internet,2024,16(9):318. [34]GUO D,LU S,DUAN N,et al.Unixcoder:Unified cross-modal pre-training for code representation [J].arXiv:220303850,2022. [35]JOHNSON J,DOUZE M,JÉGOU H.Billion-scale similaritysearch with GPUs [J].IEEE Transactions on Big Data,2019,7(3):535-547. [36]REYAD M,SARHAN A M,ARAFA M.A modified Adam algorithm for deep neural network optimization [J].Neural Computing and Applications,2023,35(23):17095-17112. [37]CAI T T,MA R.Theoretical foundations of t-sne for visualizing high-dimensional clustered data [J].Journal of Machine Learning Research,2022,23(301):1-54. [38]WANG Q,PARNIN C,ORSO A.Evaluating the usefulness of ir-based fault localization techniques[C]//Proceedings of the 2015 International Symposium on Software Testing And analysis.2015:1-11. [39]LEE J,KIM D,BISSYANDÉ T F,et al.Bench4bl:reproducibility study on the performance of ir-based bug localization[C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis.2018:61-72. |
|