计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 250200086-11.doi: 10.11896/jsjkx.250200086

• 计算机软件&体系架构 • 上一篇    下一篇

BiGCN-TL:软件错误部分定位场景下二分图图卷积神经网络Transformer定位模型

施恩译1, 常舒予2, 陈可佳2,3, 张扬2, 黄海平2,4   

  1. 1 南京邮电大学贝尔英才学院 南京 210023
    2 南京邮电大学计算机学院 南京 210023
    3 江苏省大数据安全与智能处理重点实验室(南京邮电大学) 南京 210023
    4 江苏省无线传感网高技术研究重点实验室 南京 210023
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 黄海平(hhp@njupt.edu.cn)
  • 作者简介:(q22010115@njupt.edu.cn)
  • 基金资助:
    国家自然科学基金重大研究计划(92467202);网络空间安全态势感知与评估安徽省重点实验室基金项目(TK224013);江苏省研究生科研与实践创新计划(KYCX24_1234,KYCX23_1077);江苏省大学生创新创业训练计划项目(202410293085Z)

BiGCN-TL:Bipartite Graph Convolutional Neural Network Transformer Localization Model for Software Bug Partial Localization Scenarios

SHI Enyi1, CHANG Shuyu2, CHEN Kejia2,3, ZHANG Yang2, HUANG Haiping2,4   

  1. 1 Bell Honors School,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    3 Jiangsu Key Laboratory of Big Data Security & Intelligent Processing,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    4 Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks,Nanjing 210023,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:SHI Enyi,born in 2004,undergraduate.His main research interests include AI based security and deep learning.
    HUANG Haiping,born in 1981,Ph.D,professor,Ph.D supervisor,is a senior member of CCF(No.15253S).His main research interests include information security and data privacy in IoT.
  • Supported by:
    Major Research Plan of the National Natural Science Foundation of China(92467202),Open Fund of Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation(TK224013),Postgraduate Research and Practice Innovation Program of Jiangsu Province(KYCX24_1234,KYCX23_1077) and Innovation and Entrepreneurship Training Program for College students of Jiangsu Province(202410293085Z).

摘要: 在现代复杂软件项目中,软件错误与代码呈现“多对多”的对应关系,一个软件错误往往由多个代码变更集引起,一个代码变更集也会引起多个软件错误。因此,对于软件错误往往只能实现部分定位,难以追溯全部的相关代码。传统架构对于代码变更集或软件错误语义特征的提取,往往只分别独立地依赖各自的上下文。现代软件项目规模庞大,代码依赖错综复杂、这样分别独立的语义提取方式,降低了单个文本语义特征的质量与鲁棒性,导致最终的定位性能下滑。为实现对软件错误相关代码的全面追溯,提出了BiGCN-TL模型。BiGCN-TL重点聚焦训练模型促进不同文本之间信息交互的能力,旨在降低对单个文本语义特征质量的依赖,使得在现代软件项目规模庞大、代码依赖错综复杂、单个文本语义特征提取困难的场景下,仍能通过高效的信息交互,提取到高质量语义特征,提高定位准确率。首先根据已知的部分定位关系,微调基于Transformer的预训练模型。然后,创新性地将软件错误和代码变更集建模成二分图的数据结构,借此充分利用已知的“多对多”关系,并使用微调后的编码器得到节点特征的初始表示。之后,基于二分图设计链接预测任务,训练GCN与二分类鉴别器。借助图卷积操作和注意力机制动态更新节点特征,重点训练模型促进文本信息的交互,动态更新节点特征的能力,从而得到高质量全局分类特征,最终输出匹配预测得分。在多个数据集上开展了对比实验,结果验证了BiGCN-TL相比传统方案的优越性,并通过消融实验确认了各模块的有效性。此外,通过探索多种预训练模型与GCN的组合,并结合具体案例和可视化分析,进一步验证了BiGCN-TL的通用性与鲁棒性。

关键词: 错误定位, 预训练模型, 链接预测, 二分图, 图神经网络

Abstract: In modern complex software projects,software bugs and code changes exhibit a “many-to-many” correspondence:a single bug is often caused by multiple code changes,and a single code change can introduce multiple bugs.As a result,bug localization is often only partial,making it difficult to trace all relevant code changes.Traditional architectures typically extract semantic features of code changes and bug reports independently,relying solely on their respective contexts.However,given the large scale of modern software projects and their intricate code dependencies,such independent semantic extraction reduces the quality and robustness of individual text representations,ultimately degrading localization performance.To achieve comprehensive tracing of code related to software bugs,this paper proposes BiGCN-TL.This model focuses on enhancing the information interaction between different textual inputs,aiming to reduce reliance on the quality of individual text features.Even in scenarios where large-scale software projects exhibit complex dependencies and challenging semantic feature extraction from a single text,BiGCN-TL leverages efficient information exchange to extract high-quality semantic representations,thereby improving localization accuracy.Firstly,based on known partial localization relationships,we fine-tune a Transformer-based pre-trained model.Then,we innovatively model software bugs and code changes as a bipartite graph,leveraging the known “many-to-many” relationships.The fine-tuned encoder is used to generate the initial node representations.Secondly,this study design a link prediction task on the bipartite graph,training a GCN and a binary classification discriminator.Through graph convolution operations and attention mechanisms,node representations are dynamically updated,emphasizing the ability to promote textual information interaction and refine global classification features.The final output is a matching prediction score.Extensive comparative experiments conducted on multiple datasets validate the superiority of BiGCN-TL over traditional approaches.Additionally,ablation studies confirm the effectiveness of each module.Furthermore,the generalizability and robustness of BiGCN-TL are further verified by exploring a variety of combinations of pre-trained models and GCNs,combined with specific and visualization analysis.

Key words: Bug localization, Pre-trained model, Link prediction, Bipartite graph, Graph neural network

中图分类号: 

  • TP391
[1]CHOWDHURY S,UDDIN G,HEMMATI H,et al.Method-level bug prediction:Problems and promises [J].ACM Transactions on Software Engineering and Methodology,2024,33(4):1-31.
[2]WU J,ZHANG Z,YANG D,et al.Time-Aware Spectrum-Based Bug Localization for Hardware Design Code with Data Purification [J].ACM Transactions on Architecture and Code Optimization,2024,21(3):1-25.
[3]MAHMUD J,DE SILVA N,KHAN S A,et al.On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization[C]//Proceedings of the 46th IEEE/ACM International Conference on Software Engineering.2024:1-13.
[4]MA Y F,DU Y,LI M.Capturing the Long-Distance Dependency in the Control Flow Graph via Structural-Guided Attention for Bug Localization[C]//IJCAI.2023:2242-2250.
[5]DU Y,YU Z.Pre-training code representation with semanticflow graph for effective bug localization[C]//Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2023:579-591.
[6]CIBOROWSKA A,DAMEVSKI K.Fast changeset-based buglocalization with BERT[C]//Proceedings of the 44th International Conference on Software Engineering.2022:946-957.
[7]YANG A Z,LE GOUES C,MARTINS R,et al.Large language models for test-free fault localization[C]//Proceedings of the 46th IEEE/ACM International Conference on Software Engineering.2024:1-12.
[8]BO L,JI W,SUN X,et al.ChatBR:Automated assessment and improvement of bug report quality using ChatGPT[C]//Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering.2024:1472-1483.
[9]HOU X,ZHAO Y,LIU Y,et al.Large language models forsoftware engineering:A systematic literature review [J].ACM Transactions on Software Engineering and Methodology,2024,33(8):1-79.
[10]MA Y F,LI M.Learning from the multi-level abstraction of the control flow graph via alternating propagation for bug localization[C]//2022 IEEE International Conference on Data Mining(ICDM).IEEE,2022:299-308.
[11]LIN J,LIU Y,ZENG Q,et al.Traceability transformed:Generating more accurate links with pre-trained bert models[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering(ICSE).IEEE,2021:324-335.
[12]WU X,JIANG L,WANG P S,et al.Point Transformer V3:Simpler Faster Stronger[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:4840-4851.
[13]FRIEDMAN D,WETTIG A,CHEN D.Learning transformerprograms [J].Advances in Neural Information Processing Systems,2024,36:49044-49067.
[14]LIU F,CHENG Z,ZHU L,et al.Interest-aware message-passing GCN for recommendation[C]//Proceedings of the web conference 2021.2021:1296-1305.
[15]NIE F,HAO Z,WANG R.Multi-class support vector machine with maximizing minimum margin[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:14466-14473.
[16]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports[C]//2012 34th International Conference on Software engineering(ICSE).IEEE,2012:14-24.
[17]WANG J B,LUO J R,ZHOU Y Z,et al.Survey on Event Extraction Methods:Comparative Analysis of Deep Learning and Pre-training [J].Computer Science,2024,51(9):196-206.
[18]GU Y,TINN R,CHENG H,et al.Domain-specific languagemodel pretraining for biomedical natural language processing [J].ACM Transactions on Computing for Healthcare(HEALTH),2021,3(1):1-23.
[19]SUN K L,LUO X D,LUO Y R.Survey of Applications of Pretrained Language Models [J].Computer Science,2023,50(1):176-184.
[20]LU Y,JIANG X,FANG Y,et al.Learning to pre-train graph neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:4276-4284.
[21]MA X,GUO J,ZHANG R,et al.Pre-train a discriminative text encoder for dense retrieval via contrastive span prediction[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.2022:848-858.
[22]ZENG Z F,HU X C,CHENG Q,et al.Survey of Research on Knowledge Graph Based on Pre-trained Language Models [J] Computer Science,2025,52(1):1-33.
[23]LIU Y.Roberta:A robustly optimized bert pretraining approach [J].arXiv preprint arXiv:190711692,2019,364.
[24]MA S,LIU J W,ZUO X.Survey on Graph Neural Network [J].Journal of Computer Research and Development,2022,59(01):47-80.
[25]LIU J,SHANG X Q,SONG L Y,et al.Progress of Graph Neural Networks on Complex Graph Mining [J].Journal of Software,2022,33(10):3582-3618.
[26]PAN X,GE C,LU R,et al.On the integration of self-attention and convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:815-825.
[27]ZHU Z,LI Y,TONG H,et al.Cooba:Cross-project bug localiza-tion via adversarial transfer learning[C]//IJCAI.2020:3565-3571.
[28]ZHU Z,TONG H,WANG Y,et al.BL-GAN:Semi-supervised bug localization via generative adversarial network [J].IEEE Transactions on Knowledge and Data Engineering,2022,35(11):11112-11125.
[29]TANG Z,SHEN X,LI C,et al.Ast-trans:Code summarization with efficient tree-structured attention[C]//Proceedings of the 44th International Conference on Software Engineering.2022:150-162.
[30]XIA W,GAO Q,WANG Q,et al.Tensorized bipartite graph learning for multi-view clustering [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):5187-5202.
[31]PAN C,LU M,XU B.An empirical study on software defectprediction using codebert model [J].Applied Sciences,2021,11(11):4793.
[32]BHATTI U A,TANG H,WU G,et al.Deep learning withgraph convolutional networks:An overview and latest applications in computational intelligence [J].International Journal of Intelligent Systems,2023,2023(1):8342104.
[33]VRAHATIS A G,LAZAROS K,KOTSIANTIS S.Graph attention networks:a comprehensive review of methods and applications [J].Future Internet,2024,16(9):318.
[34]GUO D,LU S,DUAN N,et al.Unixcoder:Unified cross-modal pre-training for code representation [J].arXiv:220303850,2022.
[35]JOHNSON J,DOUZE M,JÉGOU H.Billion-scale similaritysearch with GPUs [J].IEEE Transactions on Big Data,2019,7(3):535-547.
[36]REYAD M,SARHAN A M,ARAFA M.A modified Adam algorithm for deep neural network optimization [J].Neural Computing and Applications,2023,35(23):17095-17112.
[37]CAI T T,MA R.Theoretical foundations of t-sne for visualizing high-dimensional clustered data [J].Journal of Machine Learning Research,2022,23(301):1-54.
[38]WANG Q,PARNIN C,ORSO A.Evaluating the usefulness of ir-based fault localization techniques[C]//Proceedings of the 2015 International Symposium on Software Testing And analysis.2015:1-11.
[39]LEE J,KIM D,BISSYANDÉ T F,et al.Bench4bl:reproducibility study on the performance of ir-based bug localization[C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis.2018:61-72.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!