Computer Science ›› 2025, Vol. 52 ›› Issue (5): 149-160.doi: 10.11896/jsjkx.240200016

• Database & Big Data & Data Science • Previous Articles     Next Articles

Study on Graph Data Augmentation Based on Graph Entropy Theory

FU Kun1, CUI Jingyuan1, DANG Xing2,3, CHENG Xiao2,3, YING Shicong1, LI Jianwei1   

  1. 1 School of Artificial Intelligence and Data Science,Hebei University of Technology,Tianjin 300401,China
    2 Tianjin Institute of Aerospace Mechanical and Electrical Equipment,Tianjin 300462,China
    3 Tianjin Key Laboratory of Aerospace Intelligent Equipment Technology,Tianjin 300462,China
  • Received:2024-02-02 Revised:2024-05-25 Online:2025-05-15 Published:2025-05-12
  • About author:FU Kun,born in 1979,Ph.D,associate professor.Her main research interests include network representation learning and social network analysis.
  • Supported by:
    National Natural Science Foundation of China(62072154),Tianjin Science and Technology Plan Project(22JCYBJC01740) and Hebei Province Major Science and Technology Achievement Transformation Special Project(22280803Z).

Abstract: Graph data augmentation,as a technique aiming to enhance the performance of graph neural networks,involves transforming and expanding the graph structure and node features to increase the diversity and quantity of training data.The integrity of information structures,the smoothness of feature manifold,the diversity of graph,and local dependencies are difficult to comprehensively considered in graph data augmentation.Additionally,over-smoothing and over-fitting problems exist in the training of graph neural networks,which limit their learning capabilities.To address these issues,a graph data augmentation model(NRGE) based on the entropy theory in thermodynamics is proposed.Firstly,a novel definition of graph entropy is introduced to measure the smoothness of the feature manifold.A new data augmentation strategy,whose main idea is to reduce the loss of graph entropy is proposed to generate more appropriate training data.Secondly, the sampling neighbors of the nodes are augmented to ensure the consistency of data augmentation.To increase the diversity of data augmentation,the first-order neighbors of nodes are randomly replaced with their second-order neighbors.Finally,a neighbor-constrained regularization method is introduced,which improves model performance by enforcing prediction consistency between augmented neighbors.Ablation experiments show that the NRGE model effectively reduces the loss of graph entropy by preserving the information structure of triangles,thereby improving learning effect.Three real datasets are trained by the NRGE model.The obtained low-dimensional representation is applied to node classification.Compared with the baseline methods,the NRGE model achieves a performance improvement of 1.1% on the Cora dataset,0.8% on the Citeseer dataset,and a slight decrease of 0.4% on the Pubmed dataset.The experimental results show that the NRGE model can significantly enhance the performance of graph neural networks and improve the generalization ability.

Key words: Graph entropy, Graph data augmentation, Neighbor replacement, Consistency and diversity, Structural enhancement

CLC Number: 

  • TP391
[1]ZHANG D,YIN J,ZHU X,et al.Network representation lear-ning:A survey[J].IEEE Transactions on Big Data,2020,6(1):3-28.
[2]KIPF T N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[J].arXiv:1609.02907,2016.
[3]SUN K,LIN Z,ZHU Z.Multi-stage self-supervised learning for graph convolutional networks on graphs with few labeled nodes[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2020:5892-5899.
[4]CUBUK E D,ZOPH B,MANE D,et al.Autoaugment:Learning augmentation strategies from data[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:113-123.
[5]ADJEISAH M,ZHU X,XU H,et al.Towards data augmentation in graph neural network:An overview and evaluation[J].Computer Science Review,2023,47:100527.
[6]ZHAO T,JIN W,LIU Y,et al.Graph Data Augmentation for Graph Machine Learning:A Survey[J].arXiv:2202.08871,2022.
[7]EBRAHIMZADEH A,GISKI Z E,MARKECHOVÁ D.Logical entropy of dynamical systems-A general model[J].Mathema-tics,2017,5(1):4.
[8]LIU X,SUN D,WEI W.A Graph Data Augmentation Strategy with Entropy Preservation[J].arXiv:2107.06048,2021.
[9]BO D,HU B,WANG X,et al.Regularizing Graph Neural Networks via Consistency-Diversity Graph Augmentations[J].Proceedings of the AAAI Conference on Artificial Intelligence,2022,36(4):3913-3921.
[10]HAMILTON W L,YING R,LESKOVEC J.RepresentationLearning on Graphs:Methods and Applications[J].arXiv:1709.05584,2017.
[11]HAMILTON W,YING Z,LESKOVEC J.Inductive representation learning on large graphs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.2017:1025-1035.
[12]DASOULAS G,SCAMAN K,VIRMAUX A.Lipschitz Norma-lization for Self-Attention Layers with Application to Graph Neural Networks[J].arXiv:2103.04886,2021.
[13]ZHANG M,CHEN Y.Link prediction based on graph neural networks[C]//Proceedings of the 32nd International Confe-rence on Neural Information Processing Systems.2018:5171-5181.
[14]SCHLICHTKRULL M,KIPF T N,BLOEM P,et al.Modeling Relational Data with Graph Convolutional Networks[C]//ESWC.Cham:Springer International Publishing,2018:593-607.
[15]KIPF T N,WELLING M.Variational Graph Auto-Encoders[J].arXiv:1611.07308,2016.
[16]YING Z,YOU J,MORRIS C,et al.Hierarchical graph representation learning with differentiable pooling[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2018:4805-4815.
[17]LI G,MÜLLER M,THABET A,et al.DeepGCNs:Can GCNs go as deep as CNNs?[C]//2019IEEE/CVF International Conference on Computer Vision(ICCV).IEEE,2019:9266-9275.
[18]XUE W,LI T.Aspect Based Sentiment Analysis with GatedConvolutional Networks[J].arXiv:1805.07043,2018.
[19]DING K,XU Z,TONG H,et al.Data Augmentation for Deep Graph Learning:A Survey[J].ACM SIGKDD Explorations Newsletter,2022,24(2):61-77.
[20]SRIVASTAVA N,HINTON G,KRIZHEVSKY A,et al.Dropout:a simple way to prevent neural networks from overfitting[J].The Journal of Machine Learning Research,2014,15(1):1929-1958.
[21]RONG Y,HUANG W,XU T,et al.DropEdge:Towards Deep Graph Convolutional Networks on Node Classification[J].ar-Xiv:1907.10903,2019.
[22]FENG W,ZHANG J,DONG Y,et al.Graph random neural networks for semi-supervised learning on graphs[J].Advances in Neural Information Processing Systems,2020,33:22092-22103.
[23]VERMA V,QU M,KAWAGUCHI K,et al.Graphmix:Im-proved training of GNNs for semi-supervised learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:10024-10032.
[24]WANG Y,WANG W,LIANG Y,et al.NodeAug:Semi-Supervised Node Classification with Data Augmentation[C]//Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.ACM,2020:207-217.
[25]FENG F,HE X,TANG J,et al.Graph adversarial training:Dynamically regularizing based on graph structure[J].IEEE Transactions on Knowledge and Data Engineering,2019,33(6):2493-2504.
[26]PARK H,LEE S,KIM S,et al.Metropolis-Hastings Data Augmentation for Graph Neural Networks[J].arXiv:2203.14802,2022.
[27]AMIGÓ J M,BALOGH S G,HERNÁNDEZ S.A brief review of generalized entropies[J].Entropy,2018,20(11):813.
[28]HUANG W,ZHANG T,RONG Y,et al.Adaptive sampling towards fast graph representation learning[J].arXiv:1809.05343,2018.
[29]JIZBA P,KORBEL J.Maximum Entropy Principle in Statistical Inference:Case for Non-Shannonian Entropies[J].Physical Review Letters,2019,122(12):120601.
[30]RASHEVSKY N.Life,information theory,and topology[J].The Bulletin of Mathematical Biophysics,1955,17(3):229-235.
[31]MOWSHOWITZ A,DEHMER M.Entropy and the complexity of graphs revisited[J].Entropy,2012,14(3):559-570.
[32]KORNER J.Coding of an information source having ambiguous alphabet and the entropy of graphs[C]//6th Prague Conference on Information Theory.Academia,Prague,1971:411-425.
[33]HUANG X,QI G,WEI H,et al.A novel infrared and visibleimage information fusion method based on phase congruency and image entropy[J].Entropy,2019,21(12):1135.
[34]WANG Y Q,WU M H,GENG F Q,et al.An UnderwaterImage Enhancement Algorithm Based on Image Entropy Linear Weighting[J].Journal of Chongqing Technology and Business University(Natural Science Edition),2024(4):69-76.
[35]MCCALLUM A K.Automating the Construction of InternetPortals with Machine Learning[J].Discover Computing,2000,3:127-163.
[36]GILES C L,BOLLACKER K D,LAWRENCE S.CiteSeer:an automatic citation indexing system[C]//Proceedings of the third ACM conference on Digital libraries( DL'98).ACM,1998:89-98.
[37]SEN P,NAMATA G,BILGIC M,et al.Collective classification in network data[J].AI Magazine,2008,29(3):93-93.
[38]YIN H,BENSON A R,LESKOVEC J.Higher-order clustering in networks[J].Physical Review E,2018,97(5):052306.
[39]BENSON A R,GLEICH D F,LESKOVEC J.Higher-order organization of complex networks[J].Science,2016,353(6295):163-166.
[40]BURKHARDT P.Triangle Centrality[J].arXiv:2105.00110,2021.
[41]SHCHUR O,MUMME M,BOJCHEVSKI A,et al.Pitfalls of Graph Neural Network Evaluation[J].arXiv:1811.05868,2018.
[42]ZHOU D,BOUSQUET O,LAL T,et al.Learning with local and global consistency[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.2003.
[43]LI Q,WU X M,LIU H,et al.Label efficient semi-supervised learning via graph filtering[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2019:9582-9591.
[44]WANG H,LESKOVEC J.Unifying Graph Convolutional Neural Networks and Label Propagation[J].arXiv:2002.06755,2020.
[45]QU M,BENGIO Y,TANG J.GMNN:Graph Markov NeuralNetworks[C///Proceedings of the 36th International Confe-rence on Machine Learning.PMLR,2019:241-5250.
[46]ABU-EL-HAIJA S,PEROZZI B,KAPOOR A, et al.MixHop Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing[C]//Proceedings of the 36th International Conference on Machine Learning.PMLR,2019:21-29.
[1] LI Enji, HU Siyu, TAN Guangming, JIA Weile. Impact and Analysis of Optimizers on the Performance of Neural Network Force Fields [J]. Computer Science, 2025, 52(5): 50-57.
[2] ZHANG Jiaxiang, PAN Min, ZHANG Rui. Study on EEG Emotion Recognition Method Based on Self-supervised Graph Network [J]. Computer Science, 2025, 52(5): 122-127.
[3] JIN Hong, CHEN Like, YOU Lan, LYU Shunying, ZHOU Kaicheng, XIAO Kui. Point-of-interest Recommendation Based on Geospatial-TemporalCorrelations and Social Influence [J]. Computer Science, 2025, 52(5): 128-138.
[4] MIAO Zhuang, CUI Haoran, ZHANG Qiyang, WANG Jiabao, LI Yang. Restoration of Atmospheric Turbulence-degraded Images Based on Contrastive Learning [J]. Computer Science, 2025, 52(5): 171-178.
[5] WANG Haiyong, DING Gufei. ECG Signal Denoising Method Based on Stationary Wavelet Transform with Hyperbolic TangentThreshold Function [J]. Computer Science, 2025, 52(5): 179-186.
[6] KONG Yu, XIONG Fengguang, ZHANG Zhiqiang, SHEN Chaofan, HU Mingyue. Low Overlap Point Cloud Registration Method Based on Deep Position-aware Transformer [J]. Computer Science, 2025, 52(5): 199-211.
[7] JIANG Wenwen, XIA Ying. Improved U-Net Multi-scale Feature Fusion Semantic Segmentation Network for RemoteSensing Images [J]. Computer Science, 2025, 52(5): 212-219.
[8] HUANG Qian, SU Xinkai, LI Chang, WU Yirui. Hypergraph Convolutional Network with Multi-perspective Topology Refinement forSkeleton-based Action Recognition [J]. Computer Science, 2025, 52(5): 220-226.
[9] HE Chunhui, GE Bin, ZHANG Chong, XU Hao. Intelligent Error Correction Model for Chinese Idioms Fused with Fixed-length Seq2Seq Network [J]. Computer Science, 2025, 52(5): 227-234.
[10] HAN Daojun, LI Yunsong, ZHANG Juntao, WANG Zemin. Knowledge Graph Completion Method Fusing Entity Descriptions and Topological Structure [J]. Computer Science, 2025, 52(5): 260-269.
[11] LI Xiwang, CAO Peisong, WU Yuying, GUO Shuming, SHE Wei. Study on Security Risk Relation Extraction Based on Multi-view IB [J]. Computer Science, 2025, 52(5): 330-336.
[12] LI Xiaolan, MA Yong. Study on Lightweight Flame Detection Algorithm with Progressive Adaptive Feature Fusion [J]. Computer Science, 2025, 52(4): 64-73.
[13] PANG Mingyi, WEI Xianglin, ZHANG Yunxiang, WANG Bin, ZHUANG Jianjun. Efficient Adaptive CNN Accelerator for Resource-limited Chips [J]. Computer Science, 2025, 52(4): 94-100.
[14] DENG Ceyu, LI Duantengchuan, HU Yiren, WANG Xiaoguang, LI Zhifei. Joint Inter-word and Inter-sentence Multi-relationship Modeling for Review-basedRecommendation Algorithm [J]. Computer Science, 2025, 52(4): 119-128.
[15] WU Jie, WAN Yuan, LIU Qiujie. Consistent Block Diagonal and Exclusive Multi-view Subspace Clustering [J]. Computer Science, 2025, 52(4): 138-146.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!