计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 48-55.doi: 10.11896/jsjkx.221200158

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于启发式粗化算法的半监督图神经网络的训练加速框架及算法

陈裕丰, 黄增峰   

  1. 复旦大学大数据学院 上海200433
  • 收稿日期:2022-12-27 修回日期:2023-05-25 出版日期:2024-03-15 发布日期:2024-03-13
  • 通讯作者: 黄增峰(huangzf@fudan.edu.cn)
  • 作者简介:(20210980146@fudan.edu.cn)

Framework and Algorithms for Accelerating Training of Semi-supervised Graph Neural Network Based on Heuristic Coarsening Algorithms

CHEN Yufeng , HUANG Zengfeng   

  1. School of Data Science,Fudan University,Shanghai 200433,China
  • Received:2022-12-27 Revised:2023-05-25 Online:2024-03-15 Published:2024-03-13
  • About author:CHEN Yufeng,born in 1999,postgra-duate.His main research interest is graph neural networks.HUANG Zengfeng,born in 1986,Ph.D,researcher,doctoral advisor.His main research interests include machine learning algorithms and theory,big data and theoretical computation computer science.

摘要: 图神经网络是当前阶段图机器学习的主流工具,发展势头强劲。通过构建抽象图结构,运用图神经网络模型能够高效地处理多种应用场景下的问题,包括节点预测、链接预测和图分类等方向。与之相对应,一直以来,在大规模图上的应用是图神经网络训练中的关键点和难点,如何有效、快速地在大规模图数据上进行图神经网络的训练和部署是阻碍图神经网络进一步工业化应用的一大难题。图神经网络因为能够利用图的网络结构的拓扑信息,所以在如节点预测的赛道上能够取得比一般其他神经网络如多层感知机等更好的效果,但是图的网络结构的节点个数和边的条数的规模增长制约了图神经网络的训练,真实数据集的节点数量规模达到千万级别甚至亿级别,或者是部分稠密的网络结构中边的数量规模亦达到了千万级别,使得传统的图神经网络训练方法均难以直接取得成效。针对以上问题,改进并提出了基于图粗化算法的新型图神经网络训练框架,并在此基础上提出了两种具体的训练算法,同时配合提出了两种简单的启发式图粗化算法。在精度损失可以接受和内存空间消耗大大降低的前提下,所提算法能够进一步显著地降低图神经网络的计算量,缩短训练时间,实验结果表明其在常见数据集上均能取得令人满意的成绩。

关键词: 图神经网络, 图粗化, 训练加速, 启发式, 随机游走, 无偏

Abstract: Graph neural network is the mainstream tool of graph machine learning at the current stage,and it has broad development prospects.By constructing an abstract graph structure,the graph neural network model can be used to efficiently deal with problems in various application scenarios,including node prediction,link prediction,and graph classification.But the application on large-scale graphs has always been the key point and difficulty in graph neural network training.And how to effectively and quickly train and deploy graph neural networks on large-scale graph data is a major problem hindering the further industrial application of graph neural networks.Graph neural network can use the topological information of the network structure of the graph,so as to achieve better results than other general neural networks such as multi-layer perceptron on the node prediction problem.But the rapid growth of the number of nodes and edges of the graph's network structure restricts the training of the graph neural network,and the number of nodes in the real dataset is tens of millions or even billions,or the number of edges in some dense network structures has reached tens of millions.This makes it difficult for traditional graph neural network training methods to achieve direct results.This paper improves and proposes a new framework for graph neural network training based on heuristic graph coarsening algorithms,and proposes two specific training algorithms on this basis.Then this paper proposes two simple heuristic graph coarsening algorithms.Under the guarantee that the loss of accuracy is acceptable and the memory space consumption is greatly reduced,the proposed algorithm can further significantly reduce both calculation time and training time of graph neural networks.Experiment shows that satisfactory results can be achieved on common datasets.

Key words: Graph neural network, Graph coarsening, Training acceleration, Heuristic, Random walk, Unbiased

中图分类号: 

  • TP391
[1]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[2]HAMILTON W,YING Z,LESKOVEC J.Inductive representation learning on large graphs[J].Advances in Neural Information Processing Systems,2017:30,1024-1034.
[3]CHEN J,MA T,XIAO C.Fastgcn:fast learning with graph con-volutional networks via importance sampling[J].arXiv:1801.10247,2018.
[4]CHEN J,ZHU J,SONG L.Stochastic training of graph convolutional networks with variance reduction[J].arXiv:1710.10568,2017.
[5]CHANG W L,LIU X,SI S,et al.Cluster-gcn:An efficient algorithm for training deep and large graph convolutional networks[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2019:257-266.
[6]ZENG H,ZHOU H,SRIVASTAVA A,et al.Graphsaint:Graph sampling based inductive learning method[J].arXiv:1907.04931,2019.
[7]HUANG Z,ZHANG S,XI C,et al.Scaling up graph neural networks via graph coarsening[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.2021:675-684.
[8]ANDERSEN R,CHUNG F,LANG K.Local graph partitioning using pagerank vectors[C]//2006 47th Annual IEEE Sympo-sium on Foundations of Computer Science(FOCS'06).IEEE,2006:475-486.
[9]LOUKAS A.Graph Reduction with Spectral and Cut Guarantees[J].Journal of Machine Learning Research,2019,20(116):1-42.
[10]LOUKAS A,VANDERGHEYNST P.Spectrally approximating large graphs with smaller graphs[C]//International Conference on Machine Learning.PMLR,2018:3237-3246.
[11]WU F,SOUZA A,ZHANG T,et al.Simplifying graph convolutional networks[C]//International Conference on Machine Learning.PMLR,2019:6861-6871.
[12]YANG Z,COHEN W,SALAKHUDINOV R.Revisiting semi-supervised learning with graph embeddings[C]//International Conference on Machine Learning.PMLR,2016:40-48.
[13]PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:Online lear-ning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2014:701-710.
[14]WANG M,ZHENG D,YE Z,et al.Deep graph library:A graph-centric,highly-performant package for graph neural networks[J].arXiv:1909.01315,2019.
[15]GOODFELLOW I,BENGIO Y,COURVILLE A.Deep learning[M].The MIT press,2016.
[16]KINGMA D P,BA J.Adam:A method for stochastic optimization[J].arXiv:1412.6980,2014.
[17]BOJCHEVSKI A,KLICPERA J,PEROZZI B,et al.Scalinggraph neural networks with approximate pagerank[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.2020:2464-2473.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!