计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 149-158.doi: 10.11896/jsjkx.210100200

• 计算机软件 • 上一篇    下一篇

基于图神经网络的软件系统中关键类的识别

张健雄1, 宋坤1, 何鹏1,2, 李兵3   

  1. 1 湖北大学计算机与信息工程学院 武汉430062
    2 应用数学湖北省重点实验室 武汉430062
    3 武汉大学计算机学院 武汉430072
  • 收稿日期:2021-01-26 修回日期:2021-05-09 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 何鹏(penghe@hubu.edu.cn)
  • 作者简介:930065661@qq.com
  • 基金资助:
    国家重点研发计划(2018YFB1003801);国家自然科学基金(61832014, 61902114, 61977021);湖北省科技重大专项(2019ACA144);应用数学湖北省重点实验室开放基金(HBAM201901)

Identification of Key Classes in Software Systems Based on Graph Neural Networks

ZHANG Jian-xiong1, SONG Kun1, HE Peng1,2, LI Bing3   

  1. 1 School of Computer and Information Engineering,Hubei University,Wuhan 430062,China
    2 Hubei Provincial Key Laboratory of Applied Mathematics,Wuhan 430062,China
    3 School of Computer Science,Wuhan University,Wuhan 430072,China
  • Received:2021-01-26 Revised:2021-05-09 Online:2021-12-15 Published:2021-11-26
  • About author:ZHANG Jian-xiong,born in 1998,postgraduate.His main research interests include network embedding,neural network and software network.
    HE Peng,born in 1988,Ph.D,associate professor,postgraduate supervisor,is a member of China Computer Federation.His main research interests include software engineering and complex networks.
  • Supported by:
    National Key R & D Program of China(2018YFB1003801),National Natural Science Foundation of China(61832014,61902114,61977021),Science and Technology Innovation Program of Hubei Province(2019ACA144) and Open Foundation of Hubei Key Laboratory of Applied Mathematics(HBAM201901).

摘要: 软件系统中通常存在一些在拓扑结构上处于核心位置的关键类,这些类上的缺陷往往会给系统带来极大的安全隐患,识别关键类对工程师理解或维护一个软件系统至关重要。针对这一问题,提出一种基于图神经网络的关键类识别方法。首先利用复杂网络理论,将软件系统抽象为软件网络;其次结合无监督网络节点嵌入学习以及邻域聚合的方式,构建一个编码-解码(encoder-decoder)框架,提取软件系统中类节点的表征向量;最后利用Pairwise排序学习实现网络中节点的重要性排序,从而实现软件系统中关键类的识别。为验证所提方法的有效性,选取4个Java开源软件作为实验对象,并与常用的5种节点重要性度量方法以及2个已有工作进行对比分析。实验结果表明:与介数中心性、K-core、接近中心性、节点收缩法和PageRank等方法相比,该方法识别关键类的效果更好;另外,相比已有工作,在前15%的关键类节点中,所提方法的召回率和准确率的提高幅度均在10%以上。

关键词: 关键类识别, 排序学习, 软件网络, 图神经网络, 网络嵌入

Abstract: There are usually some key classes which are in the core position in the topology structure of software systems.The defects in these classes will bring great security risks to the system.Therefore,it is very important to identify these key classes for engineers to understand or maintain an unfamiliar software system.To do this,the paper proposes a novel method of identifying key classes based on graph neural networks.Specifically,the software system is abstracted as software network by using complex network theory,and then by combining unsupervised network embedding learning and neighborhood aggregation mode,we construct an encoder-decoder framework to extract the representation vector of class nodes in software system.Finally,according to the obtained node representations,Pairwise learning-to-rank algorithm is adopted to realize the importance ranking of nodes,so as to achieve the identification of key classes in software system.In order to verify the effectiveness of our method,an empirical analysis of four object-oriented Java open-source software is done,and we compare it with five commonly used node importance measurement methods and two existing works.The experimental results show that,compared with node centrality,K-core and PageRank,the proposed method is more effective in identifying key classes from the perspective of network robustness.In addition,on the existing public labeled dataset,the recall and precision of this paper are better at the top 15% percent of nodes,and improved by more than 10%.

Key words: Graph neural network, Key class identification, Learning to rank, Network embedding, Software network

中图分类号: 

  • TP311.53
[1]PAN W,LI B,MA Y,et al.Measuring structural quality of object-oriented softwares via bug propagation analysis on weighted software networks[J].Journal of Computer Science and Technology,2010,25(6):1202-1213.
[2]MAGGIE H,KATERINA G P.Common trends in software fault and failure data[J].IEEE Transactions on Software Engineering,2009,35(4):484-496.
[3]DAI J Y,WANG B,SHENG J F,et al.Identifying influential nodes in complex networks based on local neighbor contribution[J].IEEE Access,2019,DOI:10.1109/ACCESS.2019.2939804.
[4]RUAN Y R,LAO S Y,WANG J D,et al.Node importance measurement based on neighborhood similarity in complex network[J].Journal of Physics,2017,66(3):1-8.
[5]TAN Y J,WU J,DENG H Z.Evaluation Method for Node Importance base on Node Contractionin Complex Networks[J].Systems Engineering Theory and Practice,2006,26(11):79-83.
[6]BIAN T,HU J T,DENG Y.Identifying influential nodes in complex networks based on AHP[J].Physica A Statistical Mechanics and its Applications,2017,479:422-436.
[7]WATTS D J,STROGATZ S H.Collective dynamics of ‘small-world' networks[J].Nature,1998,393:440-442.
[8]BARABÁSI A L,ALBERT R.Emergence of scaling in random networks[J].Science,1999,286:509-512.
[9]CONCAS G,LOCCI M F,PINNA S,et al.Fractal dimension in software networks[J].Europhysics Letters,2006,76(6):1221-1227.
[10]WANG X F,LI X,CHEN G R.Complex Network Theory and Its Application[M].Beijing:Tsinghua University Press,2006.
[11]HE K Q,MA Y T,LIU J,et al.Software Network[M].Beijing:Science Press,2008.
[12]MYERS C R.Software systems as complex networks:Struc- ture,function,and evolvability of software collaboration graphs[J].Phys. Rev. E,2003,68:46116.
[13]PAN W,LI B,MA Y,et al.Class structure refactoring of object-oriented softwares using community detection in dependency[J].Networks Frontiers of Computer Science in China,2009,3(3):396-404.
[14]VALVERDE S,CANCHO R F,SOLÉ R V.Scale free networks from optimal design[J].Europhysics Letters,2002,60(4):512-517.
[15]OPSAHL T,AGNEESSENS F,SKVORETZ J.Node centrality in weighted networks:generalizing degree and shortest paths[J].Social Networks,2010,32(3):245-251.
[16]KITSAK M,GALLOS L K,HAVLIN S,et al.Identification of influential spreaders in complex networks[J].Nature Physics,2010,6(11):888-893.
[17]LI P X,RENG Y Q,XI Y M.An Importance Measure of Actors(Set) within a Network[J].Systems Engineering,2004,22:13-20.
[18]BRIN S,PAGE L.The anatomy of a large-scale hypertextual web search engine[J].Computer Networks,2012,56(18):3825-3833.
[19]KLEINBERG J M.Authoritative sources in a hyperlinked environment[J].Journal of the ACM,1999,46(5):604-632.
[20]WANG M,PAN W.A comparative study of network centrality metrics in identifying key classes in software[J].Journal of Computational Information Systems,2012,8(24):10205-10212.
[21]PERIN F,RENGGLI L,RESSIA J.Ranking software artifacts[C]//Proceedings of 4th Workshop on FAMIX and Moose in Software Reengineering(FAMOO-Sr'10).2010:1-4.
[22]WANG M S,LU H M.Identifying Key Classes Using h-Index and its Variants[J].Computer Science and Exploration,2011,5(10):891-903.
[23]HU S W,LI B,HE P,et al.Approach Based h-index to Measu-ring the Important Classes in Software Network[J].Small Microcomputer System,2017(2):249-253.
[24]WANG Y,YU H,ZHU Z L.A Class Integration Test Order Method Based on the Node Importance of Software[J].Compu-ter Research and Development,2016,53(3):17-30.
[25]LI D W,LI B,HE P,et al.Ranking the importance of classes via software structural analysis[J].Future Communication,Computing,Control and Management,2012,141:441-449.
[26]JIANG S J,JU X L,WANG X Y,et al.Measuring the Importance of Classes Using UIO Sequence[J].Electronic Journals,2015,43(10):2062-2068.
[27]WANG J,AI J,YANG Y,et al.Identifying key classes of object-oriented software based on software complex network[C]//International Conference on System Reliability & Safety.IEEE,2017:444-449.
[28]SRINIVASAN S M,SANGWAN R S,NEILL C J.On the measures for ranking software components[J].Innovations in Systems and Software Engineering,2017,13:161-175.
[29]PAN W F,MING H,CARL K,et al.ElementRank:Ranking Java software classes and packages using multilayer complex network-based approach[C]//IEEE Transactions on Software Engineering.2019.
[30]PAN W F,SONG B B,LI K S,et al.Identifying key classes in object-oriented software using generalized k-core decomposition[J].Future Generation Computer Systems,2018,81:188-202.
[31]GORI M,MONFARDINI G,SCARSELLI F.A new model for learning in graph domains[C]//Proc. of IJCNN. IEEE,2005:729-734.
[32]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2009,20(1):61-80.
[33]GALLICCHIO C,MICHELI A.Graph echo state networks [C]//IJCNN.IEEE,2010:1-8.
[34]SIMONOVSKY M,KOMODAKIS N.Graphvae:Towards gene-ration of small graphs using variational autoencoders[C]//International Conference on Artificial Neural Networks.2018:412-422.
[35]BOJCHEVSKI A,SHCHUR O,ZUGNER D,et al.Netgan: Generating graphs via random walks[C]//Proceedings of the 35th Int. Conference on Machine Learning.2018:610-619.
[36]LIU T Y.Learning to Rank for Information Retrieval[M].Berlin:Springer,2011.
[37]SHI Z,KEUNG J,BENNIN K E,et al.Comparing learning to rank techniques in hybrid bug localization[J].Applied Soft Computing,2018,62:636-648.
[38]BURGES C,SHAKED T,RENSHAW E,et al.Learning to rank using gradient descent[C]//Proceedings of the 22Nd International Conference on Machine Learning.Bonn,Germany,2005:89-96.
[39]FREUND Y,IYER R,SCHAPIRE R E,et al.An efficient boosting algorithm for combining preferences[J].Journal of Machine Learning Research,2004,4(6):170-178.
[40]WU Q,BURGES C C,SVORE K,et al.Adapting boosting for information retrieval measures[J].Inf. Retrieval.2010,13(3):254-270.
[41]KIPF T N,WELLING M.Semi-supervised classification with graph convolutional networks[C]//Proc. of International Conference on Learning Representations.Toulon,France,2017,arXiv:1609.02907v4.
[42]HAMILTON W L,YING R,LESKOVEC J.Inductive representation learning on large graphs[C]//31st Conference on Neural Information Processing Systems(NIPS 2017).Long Beach,CA,USA,2017:1-11.
[43]LI Y,TARLOW D,BROCKSCHMIDT M,et al.Gated graph sequence neural networks[J].arXiv:1511.05493,2015.
[44]XU K,LI C T,TIAN Y L,et al.Representation learning on graphs with jumping knowledge networks[C]//Proceedings of the 35th International Conference on Machine Learning.Stockholm,Sweden,2018:5449-5458.
[45]ZAIDMAN A,DEMEYER S.Automatic identification of key classes in a software system using web mining techniques[J].Journal of Software Maintenance and Evolution:Research and Practice,2008,20(6):387-417.
[46]IYER S,KILLINGBACK T,SUNDARAM B,et al.Attack robustness and centrality of complex networks[J].PLoS ONE,2013,8(4):e59613.
[47]FAN C,ZENG L,DING Y,et al.Learning to identify high betweenness centrality nodes from scratch:a novel graph neural network approach[J].arXiv:1905.10418v1.
[1] 周芳泉, 成卫青.
基于全局增强图神经网络的序列推荐
Sequence Recommendation Based on Global Enhanced Graph Neural Network
计算机科学, 2022, 49(9): 55-63. https://doi.org/10.11896/jsjkx.210700085
[2] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[3] 齐秀秀, 王佳昊, 李文雄, 周帆.
基于概率元学习的矩阵补全预测融合算法
Fusion Algorithm for Matrix Completion Prediction Based on Probabilistic Meta-learning
计算机科学, 2022, 49(7): 18-24. https://doi.org/10.11896/jsjkx.210600126
[4] 杨炳新, 郭艳蓉, 郝世杰, 洪日昌.
基于数据增广和模型集成策略的图神经网络在抑郁症识别上的应用
Application of Graph Neural Network Based on Data Augmentation and Model Ensemble in Depression Recognition
计算机科学, 2022, 49(7): 57-63. https://doi.org/10.11896/jsjkx.210800070
[5] 熊中敏, 舒贵文, 郭怀宇.
融合用户偏好的图神经网络推荐模型
Graph Neural Network Recommendation Model Integrating User Preferences
计算机科学, 2022, 49(6): 165-171. https://doi.org/10.11896/jsjkx.210400276
[6] 邓朝阳, 仲国强, 王栋.
基于注意力门控图神经网络的文本分类
Text Classification Based on Attention Gated Graph Neural Network
计算机科学, 2022, 49(6): 326-334. https://doi.org/10.11896/jsjkx.210400218
[7] 余皑欣, 冯秀芳, 孙静宇.
结合物品相似性的社交信任推荐算法
Social Trust Recommendation Algorithm Combining Item Similarity
计算机科学, 2022, 49(5): 144-151. https://doi.org/10.11896/jsjkx.210300217
[8] 李勇, 吴京鹏, 张钟颖, 张强.
融合快速注意力机制的节点无特征网络链路预测算法
Link Prediction for Node Featureless Networks Based on Faster Attention Mechanism
计算机科学, 2022, 49(4): 43-48. https://doi.org/10.11896/jsjkx.210800276
[9] 曹合心, 赵亮, 李雪峰.
图神经网络在Text-to-SQL解析中的技术研究
Technical Research of Graph Neural Network for Text-to-SQL Parsing
计算机科学, 2022, 49(4): 110-115. https://doi.org/10.11896/jsjkx.210200173
[10] 苗旭鹏, 周跃, 邵蓥侠, 崔斌.
GSO:基于图神经网络的深度学习计算图子图替换优化框架
GSO:A GNN-based Deep Learning Computation Graph Substitutions Optimization Framework
计算机科学, 2022, 49(3): 86-91. https://doi.org/10.11896/jsjkx.210700199
[11] 陈世聪, 袁得嵛, 黄淑华, 杨明.
基于结构深度网络嵌入模型的节点标签分类算法
Node Label Classification Algorithm Based on Structural Depth Network Embedding Model
计算机科学, 2022, 49(3): 105-112. https://doi.org/10.11896/jsjkx.201000177
[12] 郭磊, 马廷淮.
基于好友亲密度的用户匹配
Friend Closeness Based User Matching
计算机科学, 2022, 49(3): 113-120. https://doi.org/10.11896/jsjkx.210200137
[13] 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞.
基于节点相似性和网络嵌入的复杂网络社区发现算法
Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding
计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009
[14] 郑苏苏, 关东海, 袁伟伟.
融合不完整多视图的异质信息网络嵌入方法
Heterogeneous Information Network Embedding with Incomplete Multi-view Fusion
计算机科学, 2021, 48(9): 68-76. https://doi.org/10.11896/jsjkx.210500203
[15] 张人之, 朱焱.
基于主动学习的社交网络恶意用户检测方法
Malicious User Detection Method for Social Network Based on Active Learning
计算机科学, 2021, 48(6): 332-337. https://doi.org/10.11896/jsjkx.200700151
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!