计算机科学 ›› 2021, Vol. 48 ›› Issue (12): 149-158.doi: 10.11896/jsjkx.210100200

• 计算机软件 • 上一篇    下一篇

基于图神经网络的软件系统中关键类的识别

张健雄1, 宋坤1, 何鹏1,2, 李兵3   

  1. 1 湖北大学计算机与信息工程学院 武汉430062
    2 应用数学湖北省重点实验室 武汉430062
    3 武汉大学计算机学院 武汉430072
  • 收稿日期:2021-01-26 修回日期:2021-05-09 出版日期:2021-12-15 发布日期:2021-11-26
  • 通讯作者: 何鹏(penghe@hubu.edu.cn)
  • 作者简介:930065661@qq.com
  • 基金资助:
    国家重点研发计划(2018YFB1003801);国家自然科学基金(61832014, 61902114, 61977021);湖北省科技重大专项(2019ACA144);应用数学湖北省重点实验室开放基金(HBAM201901)

Identification of Key Classes in Software Systems Based on Graph Neural Networks

ZHANG Jian-xiong1, SONG Kun1, HE Peng1,2, LI Bing3   

  1. 1 School of Computer and Information Engineering,Hubei University,Wuhan 430062,China
    2 Hubei Provincial Key Laboratory of Applied Mathematics,Wuhan 430062,China
    3 School of Computer Science,Wuhan University,Wuhan 430072,China
  • Received:2021-01-26 Revised:2021-05-09 Online:2021-12-15 Published:2021-11-26
  • About author:ZHANG Jian-xiong,born in 1998,postgraduate.His main research interests include network embedding,neural network and software network.
    HE Peng,born in 1988,Ph.D,associate professor,postgraduate supervisor,is a member of China Computer Federation.His main research interests include software engineering and complex networks.
  • Supported by:
    National Key R & D Program of China(2018YFB1003801),National Natural Science Foundation of China(61832014,61902114,61977021),Science and Technology Innovation Program of Hubei Province(2019ACA144) and Open Foundation of Hubei Key Laboratory of Applied Mathematics(HBAM201901).

摘要: 软件系统中通常存在一些在拓扑结构上处于核心位置的关键类,这些类上的缺陷往往会给系统带来极大的安全隐患,识别关键类对工程师理解或维护一个软件系统至关重要。针对这一问题,提出一种基于图神经网络的关键类识别方法。首先利用复杂网络理论,将软件系统抽象为软件网络;其次结合无监督网络节点嵌入学习以及邻域聚合的方式,构建一个编码-解码(encoder-decoder)框架,提取软件系统中类节点的表征向量;最后利用Pairwise排序学习实现网络中节点的重要性排序,从而实现软件系统中关键类的识别。为验证所提方法的有效性,选取4个Java开源软件作为实验对象,并与常用的5种节点重要性度量方法以及2个已有工作进行对比分析。实验结果表明:与介数中心性、K-core、接近中心性、节点收缩法和PageRank等方法相比,该方法识别关键类的效果更好;另外,相比已有工作,在前15%的关键类节点中,所提方法的召回率和准确率的提高幅度均在10%以上。

关键词: 软件网络, 关键类识别, 网络嵌入, 图神经网络, 排序学习

Abstract: There are usually some key classes which are in the core position in the topology structure of software systems.The defects in these classes will bring great security risks to the system.Therefore,it is very important to identify these key classes for engineers to understand or maintain an unfamiliar software system.To do this,the paper proposes a novel method of identifying key classes based on graph neural networks.Specifically,the software system is abstracted as software network by using complex network theory,and then by combining unsupervised network embedding learning and neighborhood aggregation mode,we construct an encoder-decoder framework to extract the representation vector of class nodes in software system.Finally,according to the obtained node representations,Pairwise learning-to-rank algorithm is adopted to realize the importance ranking of nodes,so as to achieve the identification of key classes in software system.In order to verify the effectiveness of our method,an empirical analysis of four object-oriented Java open-source software is done,and we compare it with five commonly used node importance measurement methods and two existing works.The experimental results show that,compared with node centrality,K-core and PageRank,the proposed method is more effective in identifying key classes from the perspective of network robustness.In addition,on the existing public labeled dataset,the recall and precision of this paper are better at the top 15% percent of nodes,and improved by more than 10%.

Key words: Software network, Key class identification, Network embedding, Graph neural network, Learning to rank

中图分类号: 

  • TP311.53
[1]PAN W,LI B,MA Y,et al.Measuring structural quality of object-oriented softwares via bug propagation analysis on weighted software networks[J].Journal of Computer Science and Technology,2010,25(6):1202-1213.
[2]MAGGIE H,KATERINA G P.Common trends in software fault and failure data[J].IEEE Transactions on Software Engineering,2009,35(4):484-496.
[3]DAI J Y,WANG B,SHENG J F,et al.Identifying influential nodes in complex networks based on local neighbor contribution[J].IEEE Access,2019,DOI:10.1109/ACCESS.2019.2939804.
[4]RUAN Y R,LAO S Y,WANG J D,et al.Node importance measurement based on neighborhood similarity in complex network[J].Journal of Physics,2017,66(3):1-8.
[5]TAN Y J,WU J,DENG H Z.Evaluation Method for Node Importance base on Node Contractionin Complex Networks[J].Systems Engineering Theory and Practice,2006,26(11):79-83.
[6]BIAN T,HU J T,DENG Y.Identifying influential nodes in complex networks based on AHP[J].Physica A Statistical Mechanics and its Applications,2017,479:422-436.
[7]WATTS D J,STROGATZ S H.Collective dynamics of ‘small-world' networks[J].Nature,1998,393:440-442.
[8]BARABÁSI A L,ALBERT R.Emergence of scaling in random networks[J].Science,1999,286:509-512.
[9]CONCAS G,LOCCI M F,PINNA S,et al.Fractal dimension in software networks[J].Europhysics Letters,2006,76(6):1221-1227.
[10]WANG X F,LI X,CHEN G R.Complex Network Theory and Its Application[M].Beijing:Tsinghua University Press,2006.
[11]HE K Q,MA Y T,LIU J,et al.Software Network[M].Beijing:Science Press,2008.
[12]MYERS C R.Software systems as complex networks:Struc- ture,function,and evolvability of software collaboration graphs[J].Phys. Rev. E,2003,68:46116.
[13]PAN W,LI B,MA Y,et al.Class structure refactoring of object-oriented softwares using community detection in dependency[J].Networks Frontiers of Computer Science in China,2009,3(3):396-404.
[14]VALVERDE S,CANCHO R F,SOLÉ R V.Scale free networks from optimal design[J].Europhysics Letters,2002,60(4):512-517.
[15]OPSAHL T,AGNEESSENS F,SKVORETZ J.Node centrality in weighted networks:generalizing degree and shortest paths[J].Social Networks,2010,32(3):245-251.
[16]KITSAK M,GALLOS L K,HAVLIN S,et al.Identification of influential spreaders in complex networks[J].Nature Physics,2010,6(11):888-893.
[17]LI P X,RENG Y Q,XI Y M.An Importance Measure of Actors(Set) within a Network[J].Systems Engineering,2004,22:13-20.
[18]BRIN S,PAGE L.The anatomy of a large-scale hypertextual web search engine[J].Computer Networks,2012,56(18):3825-3833.
[19]KLEINBERG J M.Authoritative sources in a hyperlinked environment[J].Journal of the ACM,1999,46(5):604-632.
[20]WANG M,PAN W.A comparative study of network centrality metrics in identifying key classes in software[J].Journal of Computational Information Systems,2012,8(24):10205-10212.
[21]PERIN F,RENGGLI L,RESSIA J.Ranking software artifacts[C]//Proceedings of 4th Workshop on FAMIX and Moose in Software Reengineering(FAMOO-Sr'10).2010:1-4.
[22]WANG M S,LU H M.Identifying Key Classes Using h-Index and its Variants[J].Computer Science and Exploration,2011,5(10):891-903.
[23]HU S W,LI B,HE P,et al.Approach Based h-index to Measu-ring the Important Classes in Software Network[J].Small Microcomputer System,2017(2):249-253.
[24]WANG Y,YU H,ZHU Z L.A Class Integration Test Order Method Based on the Node Importance of Software[J].Compu-ter Research and Development,2016,53(3):17-30.
[25]LI D W,LI B,HE P,et al.Ranking the importance of classes via software structural analysis[J].Future Communication,Computing,Control and Management,2012,141:441-449.
[26]JIANG S J,JU X L,WANG X Y,et al.Measuring the Importance of Classes Using UIO Sequence[J].Electronic Journals,2015,43(10):2062-2068.
[27]WANG J,AI J,YANG Y,et al.Identifying key classes of object-oriented software based on software complex network[C]//International Conference on System Reliability & Safety.IEEE,2017:444-449.
[28]SRINIVASAN S M,SANGWAN R S,NEILL C J.On the measures for ranking software components[J].Innovations in Systems and Software Engineering,2017,13:161-175.
[29]PAN W F,MING H,CARL K,et al.ElementRank:Ranking Java software classes and packages using multilayer complex network-based approach[C]//IEEE Transactions on Software Engineering.2019.
[30]PAN W F,SONG B B,LI K S,et al.Identifying key classes in object-oriented software using generalized k-core decomposition[J].Future Generation Computer Systems,2018,81:188-202.
[31]GORI M,MONFARDINI G,SCARSELLI F.A new model for learning in graph domains[C]//Proc. of IJCNN. IEEE,2005:729-734.
[32]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2009,20(1):61-80.
[33]GALLICCHIO C,MICHELI A.Graph echo state networks [C]//IJCNN.IEEE,2010:1-8.
[34]SIMONOVSKY M,KOMODAKIS N.Graphvae:Towards gene-ration of small graphs using variational autoencoders[C]//International Conference on Artificial Neural Networks.2018:412-422.
[35]BOJCHEVSKI A,SHCHUR O,ZUGNER D,et al.Netgan: Generating graphs via random walks[C]//Proceedings of the 35th Int. Conference on Machine Learning.2018:610-619.
[36]LIU T Y.Learning to Rank for Information Retrieval[M].Berlin:Springer,2011.
[37]SHI Z,KEUNG J,BENNIN K E,et al.Comparing learning to rank techniques in hybrid bug localization[J].Applied Soft Computing,2018,62:636-648.
[38]BURGES C,SHAKED T,RENSHAW E,et al.Learning to rank using gradient descent[C]//Proceedings of the 22Nd International Conference on Machine Learning.Bonn,Germany,2005:89-96.
[39]FREUND Y,IYER R,SCHAPIRE R E,et al.An efficient boosting algorithm for combining preferences[J].Journal of Machine Learning Research,2004,4(6):170-178.
[40]WU Q,BURGES C C,SVORE K,et al.Adapting boosting for information retrieval measures[J].Inf. Retrieval.2010,13(3):254-270.
[41]KIPF T N,WELLING M.Semi-supervised classification with graph convolutional networks[C]//Proc. of International Conference on Learning Representations.Toulon,France,2017,arXiv:1609.02907v4.
[42]HAMILTON W L,YING R,LESKOVEC J.Inductive representation learning on large graphs[C]//31st Conference on Neural Information Processing Systems(NIPS 2017).Long Beach,CA,USA,2017:1-11.
[43]LI Y,TARLOW D,BROCKSCHMIDT M,et al.Gated graph sequence neural networks[J].arXiv:1511.05493,2015.
[44]XU K,LI C T,TIAN Y L,et al.Representation learning on graphs with jumping knowledge networks[C]//Proceedings of the 35th International Conference on Machine Learning.Stockholm,Sweden,2018:5449-5458.
[45]ZAIDMAN A,DEMEYER S.Automatic identification of key classes in a software system using web mining techniques[J].Journal of Software Maintenance and Evolution:Research and Practice,2008,20(6):387-417.
[46]IYER S,KILLINGBACK T,SUNDARAM B,et al.Attack robustness and centrality of complex networks[J].PLoS ONE,2013,8(4):e59613.
[47]FAN C,ZENG L,DING Y,et al.Learning to identify high betweenness centrality nodes from scratch:a novel graph neural network approach[J].arXiv:1905.10418v1.
[1] 郑苏苏, 关东海, 袁伟伟. 融合不完整多视图的异质信息网络嵌入方法[J]. 计算机科学, 2021, 48(9): 68-76.
[2] 张人之, 朱焱. 基于主动学习的社交网络恶意用户检测方法[J]. 计算机科学, 2021, 48(6): 332-337.
[3] 胡昕彤, 沙朝锋, 刘艳君. 基于随机投影和主成分分析的网络嵌入后处理算法[J]. 计算机科学, 2021, 48(5): 124-129.
[4] 李思迪, 郭炳晖, 杨小博. 基于图神经网络的金融征信研究[J]. 计算机科学, 2021, 48(4): 85-90.
[5] 杨旭华, 王晨. 基于网络嵌入与局部合力的复杂网络社区划分算法[J]. 计算机科学, 2021, 48(4): 229-236.
[6] 徐新黎, 肖云月, 龙海霞, 杨旭华, 毛剑飞. 基于矩阵分解的属性网络嵌入和社区发现算法[J]. 计算机科学, 2021, 48(12): 204-211.
[7] 宁懿昕, 谢辉, 姜火文. 图神经网络社区发现研究综述[J]. 计算机科学, 2021, 48(11A): 11-16.
[8] 邢长征, 朱金侠, 孟祥福, 齐雪月, 朱尧, 张峰, 杨一鸣. 兴趣点推荐方法研究综述[J]. 计算机科学, 2021, 48(11A): 176-183.
[9] 方磊, 魏强, 武泽慧, 杜江, 张兴明. 基于神经网络的二进制函数相似性检测技术[J]. 计算机科学, 2021, 48(10): 286-293.
[10] 丁钰, 魏浩, 潘志松, 刘鑫. 网络表示学习算法综述[J]. 计算机科学, 2020, 47(9): 52-59.
[11] 王慧, 乐孜纯, 龚轩, 武玉坤, 左浩. 基于特征分类的链路预测方法综述[J]. 计算机科学, 2020, 47(8): 302-312.
[12] 余敦辉, 成涛, 袁旭. 基于排序学习的软件众包任务推荐算法[J]. 计算机科学, 2020, 47(12): 106-113.
[13] 吴勇, 王斌君, 翟一鸣, 仝鑫. 共引增强有向网络嵌入研究[J]. 计算机科学, 2020, 47(12): 279-284.
[14] 杨洋, 邸一得, 刘俊晖, 易超, 周维. 基于张量分解的排序学习在个性化标签推荐中的研究[J]. 计算机科学, 2020, 47(11A): 515-519.
[15] 赵霞, 李娴, 张泽华, 张晨威. 结合社区嵌入和节点嵌入的社区发现算法[J]. 计算机科学, 2020, 47(10): 121-125.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 马扬, 程光权, 梁星星, 李妍, 杨雨灵, 刘忠. 有向加权网络中的改进SDNE算法[J]. 计算机科学, 2020, 47(4): 233 -237 .
[2] 张清琪, 刘漫丹. 复杂网络社区发现的多目标五行环优化算法[J]. 计算机科学, 2020, 47(8): 284 -290 .
[3] 朱国晖, 张茵, 刘秀霞, 孙天骜. 节点拓扑感知的高效节能虚拟网络映射算法[J]. 计算机科学, 2020, 47(9): 270 -274 .
[4] 杨经纬, 魏子麒, 刘璘. 用户如何看待产品中的预测分析功能?——面向非功能性需求的调研报告[J]. 计算机科学, 2020, 47(12): 18 -24 .
[5] 潘孝勤, 芦天亮, 杜彦辉, 仝鑫. 基于深度学习的语音合成与转换技术综述[J]. 计算机科学, 2021, 48(8): 200 -208 .
[6] 王俊, 王修来, 庞威, 赵鸿飞. 面向科技前瞻预测的大数据治理研究[J]. 计算机科学, 2021, 48(9): 36 -42 .
[7] 余力, 杜启翰, 岳博妍, 向君瑶, 徐冠宇, 冷友方. 基于强化学习的推荐研究综述[J]. 计算机科学, 2021, 48(10): 1 -18 .
[8] 王梓强, 胡晓光, 李晓筱, 杜卓群. 移动机器人全局路径规划算法综述[J]. 计算机科学, 2021, 48(10): 19 -29 .
[9] 高洪皓, 郑子彬, 殷昱煜, 丁勇. 区块链技术专题序言[J]. 计算机科学, 2021, 48(11): 1 -3 .
[10] 毛瀚宇, 聂铁铮, 申德荣, 于戈, 徐石成, 何光宇. 区块链即服务平台关键技术及发展综述[J]. 计算机科学, 2021, 48(11): 4 -11 .