Computer Science ›› 2025, Vol. 52 ›› Issue (6A): 250200086-11.doi: 10.11896/jsjkx.250200086

• Computer Software & Architecture • Previous Articles     Next Articles

BiGCN-TL:Bipartite Graph Convolutional Neural Network Transformer Localization Model for Software Bug Partial Localization Scenarios

SHI Enyi1, CHANG Shuyu2, CHEN Kejia2,3, ZHANG Yang2, HUANG Haiping2,4   

  1. 1 Bell Honors School,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    2 School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    3 Jiangsu Key Laboratory of Big Data Security & Intelligent Processing,Nanjing University of Posts and Telecommunications,Nanjing 210023,China
    4 Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks,Nanjing 210023,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:SHI Enyi,born in 2004,undergraduate.His main research interests include AI based security and deep learning.
    HUANG Haiping,born in 1981,Ph.D,professor,Ph.D supervisor,is a senior member of CCF(No.15253S).His main research interests include information security and data privacy in IoT.
  • Supported by:
    Major Research Plan of the National Natural Science Foundation of China(92467202),Open Fund of Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation(TK224013),Postgraduate Research and Practice Innovation Program of Jiangsu Province(KYCX24_1234,KYCX23_1077) and Innovation and Entrepreneurship Training Program for College students of Jiangsu Province(202410293085Z).

Abstract: In modern complex software projects,software bugs and code changes exhibit a “many-to-many” correspondence:a single bug is often caused by multiple code changes,and a single code change can introduce multiple bugs.As a result,bug localization is often only partial,making it difficult to trace all relevant code changes.Traditional architectures typically extract semantic features of code changes and bug reports independently,relying solely on their respective contexts.However,given the large scale of modern software projects and their intricate code dependencies,such independent semantic extraction reduces the quality and robustness of individual text representations,ultimately degrading localization performance.To achieve comprehensive tracing of code related to software bugs,this paper proposes BiGCN-TL.This model focuses on enhancing the information interaction between different textual inputs,aiming to reduce reliance on the quality of individual text features.Even in scenarios where large-scale software projects exhibit complex dependencies and challenging semantic feature extraction from a single text,BiGCN-TL leverages efficient information exchange to extract high-quality semantic representations,thereby improving localization accuracy.Firstly,based on known partial localization relationships,we fine-tune a Transformer-based pre-trained model.Then,we innovatively model software bugs and code changes as a bipartite graph,leveraging the known “many-to-many” relationships.The fine-tuned encoder is used to generate the initial node representations.Secondly,this study design a link prediction task on the bipartite graph,training a GCN and a binary classification discriminator.Through graph convolution operations and attention mechanisms,node representations are dynamically updated,emphasizing the ability to promote textual information interaction and refine global classification features.The final output is a matching prediction score.Extensive comparative experiments conducted on multiple datasets validate the superiority of BiGCN-TL over traditional approaches.Additionally,ablation studies confirm the effectiveness of each module.Furthermore,the generalizability and robustness of BiGCN-TL are further verified by exploring a variety of combinations of pre-trained models and GCNs,combined with specific and visualization analysis.

Key words: Bug localization, Pre-trained model, Link prediction, Bipartite graph, Graph neural network

CLC Number: 

  • TP391
[1]CHOWDHURY S,UDDIN G,HEMMATI H,et al.Method-level bug prediction:Problems and promises [J].ACM Transactions on Software Engineering and Methodology,2024,33(4):1-31.
[2]WU J,ZHANG Z,YANG D,et al.Time-Aware Spectrum-Based Bug Localization for Hardware Design Code with Data Purification [J].ACM Transactions on Architecture and Code Optimization,2024,21(3):1-25.
[3]MAHMUD J,DE SILVA N,KHAN S A,et al.On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization[C]//Proceedings of the 46th IEEE/ACM International Conference on Software Engineering.2024:1-13.
[4]MA Y F,DU Y,LI M.Capturing the Long-Distance Dependency in the Control Flow Graph via Structural-Guided Attention for Bug Localization[C]//IJCAI.2023:2242-2250.
[5]DU Y,YU Z.Pre-training code representation with semanticflow graph for effective bug localization[C]//Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2023:579-591.
[6]CIBOROWSKA A,DAMEVSKI K.Fast changeset-based buglocalization with BERT[C]//Proceedings of the 44th International Conference on Software Engineering.2022:946-957.
[7]YANG A Z,LE GOUES C,MARTINS R,et al.Large language models for test-free fault localization[C]//Proceedings of the 46th IEEE/ACM International Conference on Software Engineering.2024:1-12.
[8]BO L,JI W,SUN X,et al.ChatBR:Automated assessment and improvement of bug report quality using ChatGPT[C]//Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering.2024:1472-1483.
[9]HOU X,ZHAO Y,LIU Y,et al.Large language models forsoftware engineering:A systematic literature review [J].ACM Transactions on Software Engineering and Methodology,2024,33(8):1-79.
[10]MA Y F,LI M.Learning from the multi-level abstraction of the control flow graph via alternating propagation for bug localization[C]//2022 IEEE International Conference on Data Mining(ICDM).IEEE,2022:299-308.
[11]LIN J,LIU Y,ZENG Q,et al.Traceability transformed:Generating more accurate links with pre-trained bert models[C]//2021 IEEE/ACM 43rd International Conference on Software Engineering(ICSE).IEEE,2021:324-335.
[12]WU X,JIANG L,WANG P S,et al.Point Transformer V3:Simpler Faster Stronger[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024:4840-4851.
[13]FRIEDMAN D,WETTIG A,CHEN D.Learning transformerprograms [J].Advances in Neural Information Processing Systems,2024,36:49044-49067.
[14]LIU F,CHENG Z,ZHU L,et al.Interest-aware message-passing GCN for recommendation[C]//Proceedings of the web conference 2021.2021:1296-1305.
[15]NIE F,HAO Z,WANG R.Multi-class support vector machine with maximizing minimum margin[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2024:14466-14473.
[16]ZHOU J,ZHANG H,LO D.Where should the bugs be fixed? more accurate information retrieval-based bug localization based on bug reports[C]//2012 34th International Conference on Software engineering(ICSE).IEEE,2012:14-24.
[17]WANG J B,LUO J R,ZHOU Y Z,et al.Survey on Event Extraction Methods:Comparative Analysis of Deep Learning and Pre-training [J].Computer Science,2024,51(9):196-206.
[18]GU Y,TINN R,CHENG H,et al.Domain-specific languagemodel pretraining for biomedical natural language processing [J].ACM Transactions on Computing for Healthcare(HEALTH),2021,3(1):1-23.
[19]SUN K L,LUO X D,LUO Y R.Survey of Applications of Pretrained Language Models [J].Computer Science,2023,50(1):176-184.
[20]LU Y,JIANG X,FANG Y,et al.Learning to pre-train graph neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:4276-4284.
[21]MA X,GUO J,ZHANG R,et al.Pre-train a discriminative text encoder for dense retrieval via contrastive span prediction[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.2022:848-858.
[22]ZENG Z F,HU X C,CHENG Q,et al.Survey of Research on Knowledge Graph Based on Pre-trained Language Models [J] Computer Science,2025,52(1):1-33.
[23]LIU Y.Roberta:A robustly optimized bert pretraining approach [J].arXiv preprint arXiv:190711692,2019,364.
[24]MA S,LIU J W,ZUO X.Survey on Graph Neural Network [J].Journal of Computer Research and Development,2022,59(01):47-80.
[25]LIU J,SHANG X Q,SONG L Y,et al.Progress of Graph Neural Networks on Complex Graph Mining [J].Journal of Software,2022,33(10):3582-3618.
[26]PAN X,GE C,LU R,et al.On the integration of self-attention and convolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2022:815-825.
[27]ZHU Z,LI Y,TONG H,et al.Cooba:Cross-project bug localiza-tion via adversarial transfer learning[C]//IJCAI.2020:3565-3571.
[28]ZHU Z,TONG H,WANG Y,et al.BL-GAN:Semi-supervised bug localization via generative adversarial network [J].IEEE Transactions on Knowledge and Data Engineering,2022,35(11):11112-11125.
[29]TANG Z,SHEN X,LI C,et al.Ast-trans:Code summarization with efficient tree-structured attention[C]//Proceedings of the 44th International Conference on Software Engineering.2022:150-162.
[30]XIA W,GAO Q,WANG Q,et al.Tensorized bipartite graph learning for multi-view clustering [J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):5187-5202.
[31]PAN C,LU M,XU B.An empirical study on software defectprediction using codebert model [J].Applied Sciences,2021,11(11):4793.
[32]BHATTI U A,TANG H,WU G,et al.Deep learning withgraph convolutional networks:An overview and latest applications in computational intelligence [J].International Journal of Intelligent Systems,2023,2023(1):8342104.
[33]VRAHATIS A G,LAZAROS K,KOTSIANTIS S.Graph attention networks:a comprehensive review of methods and applications [J].Future Internet,2024,16(9):318.
[34]GUO D,LU S,DUAN N,et al.Unixcoder:Unified cross-modal pre-training for code representation [J].arXiv:220303850,2022.
[35]JOHNSON J,DOUZE M,JÉGOU H.Billion-scale similaritysearch with GPUs [J].IEEE Transactions on Big Data,2019,7(3):535-547.
[36]REYAD M,SARHAN A M,ARAFA M.A modified Adam algorithm for deep neural network optimization [J].Neural Computing and Applications,2023,35(23):17095-17112.
[37]CAI T T,MA R.Theoretical foundations of t-sne for visualizing high-dimensional clustered data [J].Journal of Machine Learning Research,2022,23(301):1-54.
[38]WANG Q,PARNIN C,ORSO A.Evaluating the usefulness of ir-based fault localization techniques[C]//Proceedings of the 2015 International Symposium on Software Testing And analysis.2015:1-11.
[39]LEE J,KIM D,BISSYANDÉ T F,et al.Bench4bl:reproducibility study on the performance of ir-based bug localization[C]//Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis.2018:61-72.
[1] TANG Lijun , YANG Zheng, ZHAO Nan, ZHAI Suwei. FLIP-based Joint Similarity Preserving Hashing for Cross-modal Retrieval [J]. Computer Science, 2025, 52(6A): 240400151-10.
[2] ZHENG Chuangrui, DENG Xiuqin, CHEN Lei. Traffic Prediction Model Based on Decoupled Adaptive Dynamic Graph Convolution [J]. Computer Science, 2025, 52(6A): 240400149-8.
[3] TENG Minjun, SUN Tengzhong, LI Yanchen, CHEN Yuan, SONG Mofei. Internet Application User Profiling Analysis Based on Selection State Space Graph Neural Network [J]. Computer Science, 2025, 52(6A): 240900060-8.
[4] YE Jiale, PU Yuanyuan, ZHAO Zhengpeng, FENG Jue, ZHOU Lianmin, GU Jinjing. Multi-view CLIP and Hybrid Contrastive Learning for Multimodal Image-Text Sentiment Analysis [J]. Computer Science, 2025, 52(6A): 240700060-7.
[5] FANG Rui, CUI Liangzhong, FANG Yuanjing. Equipment Event Extraction Method Based on Semantic Enhancement [J]. Computer Science, 2025, 52(6A): 240900096-9.
[6] QIAO Yu, XU Tao, ZHANG Ya, WEN Fengpeng, LI Qiangwei. Graph Neural Network Defect Prediction Method Combined with Developer Dependencies [J]. Computer Science, 2025, 52(6): 52-57.
[7] GUO Xuan, HOU Jinlin, WANG Wenjun, JIAO Pengfei. Dynamic Link Prediction Method for Adaptively Modeling Network Dynamics [J]. Computer Science, 2025, 52(6): 118-128.
[8] WANG Jinghong, WU Zhibing, WANG Xizhao, LI Haokang. Semantic-aware Heterogeneous Graph Attention Network Based on Multi-view RepresentationLearning [J]. Computer Science, 2025, 52(6): 167-178.
[9] WU Pengyuan, FANG Wei. Study on Graph Collaborative Filtering Model Based on FeatureNet Contrastive Learning [J]. Computer Science, 2025, 52(5): 139-148.
[10] HUANG Qian, SU Xinkai, LI Chang, WU Yirui. Hypergraph Convolutional Network with Multi-perspective Topology Refinement forSkeleton-based Action Recognition [J]. Computer Science, 2025, 52(5): 220-226.
[11] YANG Yingxiu, CHEN Hongmei, ZHOU Lihua , XIAO Qing. Heterogeneous Graph Attention Network Based on Data Augmentation [J]. Computer Science, 2025, 52(3): 180-187.
[12] LI Shao, JIANG Fangting, YANG Xinyan, LIANG Gang. Rumor Detection on Potential Hot Topics with Bi-directional Graph Attention Network [J]. Computer Science, 2025, 52(3): 277-286.
[13] ZHENG Longhai, XIAO Bohuai, YAO Zewei, CHEN Xing, MO Yuchang. Graph Reinforcement Learning Based Multi-edge Cooperative Load Balancing Method [J]. Computer Science, 2025, 52(3): 338-348.
[14] HU Haifeng, ZHU Yiwen, ZHAO Haitao. Network Slicing End-to-end Latency Prediction Based on Heterogeneous Graph Neural Network [J]. Computer Science, 2025, 52(3): 349-358.
[15] YUAN Ye, CHEN Ming, WU Anbiao, WANG Yishu. Graph Anomaly Detection Model Based on Personalized PageRank and Contrastive Learning [J]. Computer Science, 2025, 52(2): 80-90.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!