计算机科学 ›› 2021, Vol. 48 ›› Issue (3): 201-205.doi: 10.11896/jsjkx.191200156

• 人工智能 • 上一篇    下一篇

基于KNN与矩阵变换的图节点嵌入归纳式学习算法

贺苗苗, 郭卫斌   

  1. 华东理工大学信息科学与工程学院 上海200237
  • 收稿日期:2019-12-26 修回日期:2020-05-19 出版日期:2021-03-15 发布日期:2021-03-05
  • 通讯作者: 郭卫斌(gweibin@ecust.edu.cn)
  • 作者简介:3110748768@qq.com
  • 基金资助:
    国家自然科学基金(61672227)

Inductive Learning Algorithm of Graph Node Embedding Based on KNN and Matrix Transform

HE Miao-miao, GUO Wei-bin   

  1. School of Information Science and Engineering,East China University of Science and Technology,Shanghai 200237,China
  • Received:2019-12-26 Revised:2020-05-19 Online:2021-03-15 Published:2021-03-05
  • About author:HE Miao-miao,born in 1995,postgra-duate.Her main research interests include natural language processing and so on.
    GUO Wei-bin,born in 1968,Ph.D,professor,is a member of China Computer Federation.His main research interests include high performance computing,software engineering and big data.
  • Supported by:
    National Natural Science Foundation of China(61672227).

摘要: 图节点的低维嵌入在各种预测任务中是非常有用的,如蛋白质功能预测、内容推荐等。然而,多数方法不能自然推广到不可见节点。图采样聚合算法(Graph Sample and Aggregate,Graphsage)虽然可以提高不可见节点生成嵌入的速度,但容易引入噪声数据,且生成的节点嵌入的表示能力不高。为此,文中提出了一种基于KNN与矩阵变换的图节点嵌入归纳式学习算法。首先,通过KNN选取K个邻节点;然后,根据聚合函数生成聚合信息;最后,利用矩阵变换与全连接层对聚合信息和节点信息进行计算,得到新的节点嵌入。为了有效权衡计算时间与性能,文中提出一种新的聚合函数,对邻节点特征运用最大池化作为聚合信息输出,以更多地保留邻节点信息,降低计算代价。在reddit和PPI 两个数据集上的实验表明,所提算法在micro-f1和macro-f1两个评价指标上分别获得了4.995%与10.515%的提升。因此,该算法可以大幅减少噪声数据,提高节点嵌入的表示能力,快速有效地为不可见节点及不可见图生成节点嵌入。

关键词: KNN, 表示能力, 低维嵌入, 节点嵌入, 聚合函数

Abstract: Low-dimensional embedding of graph nodes is very useful in various prediction tasks,such as protein function prediction,content recommendation and so on.However,most methods cannot be naturally extended to invisible nodes.Graph Sample and Aggregate (Graph Sample and Aggregate,Grasage) algorithm can improve the speed of invisible node generation embedding,but it is easy to introduce noise data,and the representation ability of generated node embedding is not high.In this paper,an inductive learning algorithm based on KNN and matrix transformation for graph node embedding is proposed.Firstly,K neighbo-ring nodes are selected by KNN.Then aggregation information is generated by aggregation function.Finally,aggregation information and node information are calculated by matrix transformation and full connection layer,and new node embedding is obtained.In order to balance computing time and performance effectively,this paper proposes a new aggregation function,which uses maximum pooling as aggregation information output for neighbor node features,retains more neighbor node information and reduces computing cost.Experiments on two data sets of reddit and PPI show that the proposed algorithm achieves 4.995% and 10.515% improvement on micro-f1 and macro-f1,respectively.The experimental data fully show that the algorithm can greatly reduce noise data,improve the representation ability of node embedding,and quickly and effectively generate node embedding for invisible nodes and invisible graphs.

Key words: Aggregation function, KNN, Low dimensional embedding, Node embedding, Representation ability

中图分类号: 

  • TP391
[1]CAO S S,LU W,XU Q K.grarep:learning graph representations with global structural information[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management.ACM,2015:891-900.
[2]PEROZZI B,AL R R,SKIENA S.Deepwalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Association for Computing Machinery,2014:701-710.
[3]ZHANG W P,LI Z J,LI R H,et al.Graph Structure Clustering Algorithm Based on MapReduce[J].Journal of Software,2018,29(3):627-641.
[4]WANG D,CUI P,ZHU W.Structural deep network embedding[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Association for Computing Machinery,2016:1225-1234.
[5]NG A Y,JORDAN M I,WEISS Y,et al.On spectral clustering:Analysis and an algorithm[C]//Proceedings from the Confe-rence,Neural Information Processing Systems 2001.Cambridge:Massachusetts Institute of Technology Press,2001:849-856.
[6]WILLIAM L,HAMILTON,YING Z T,et al.Inductive Repre-sentation Learning on Large Graphs[C]//Advances in Neural Information Processing Systems 30:Annual Conference on Neural Information Processing Systems 2017.2017:1024-1034.
[7]BANU S,CEMNUR A,JOOF S,et al.Microwave dielectricproperty based classification of renal calculi:Application of a kNN algorithm[J].Computers in biology and medicine,2019,98(1):4-24.
[8]ESTEBAN B,PATRICE A,PAULO G.PageRank for semi-supervised learning[J].Applied Network Science,2019,4(1):1-20.
[9]SHERVASHIDZE N,SCHWEITZER P,BORGWARDT K M,et al.Weisfeilerlehman graph kernels[J].Journal of Machine Learning Research,2011,12(3):2539-2561.
[10]DAI H,DAI B,SONG L.Discriminative embeddings of latentvariable models for structured data[C]//Proceedings of the 33nd International Conference on Machine Learning.JMLR,2016:2702-2711.
[11]GORI M,MONFARDINI G,SCARSELLI F.A new model for learning in graph domains[C]//IEEE International Joint Conference on Neural Networks.IEEE,2005:729-734.
[12]XU B B,CEN K T,HUANG J J,et al.A Survey of Convolution Neural Networks[J].Journal of Computer Science,2019,26 (10):1-31.
[13]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2009,20(1):61-80.
[14]BRUNA J,ZAREMBA W,SZLAM A,et al.Spectral networks and locally connected networks on graphs[C]//proceedings from the 2nd International Conference on Learning Representations.ICLR,2014:2667-2674.
[15]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//5th International Conference on Learning Representations.OpenReview net,2016:5169-5179.
[16]NIEPERT M,AHMED M,KUTZKOV K.Learning convolu-tional neural networks for graphs[C]//Proceedings of the 33nd International Conference on Machine Learning.JMLR,2016:2014-2023.
[17]CHIEN C T,SU L L.Design of orthogonal graph filter bankwith known eigenvalues of Laplacian matrix[J].IET Signal Processing,2019,13(5):551-561.
[18]LAN Y D,DENG H F,CHEN T.Semi Supervised ClusteringAlgorithm Based on Graph Contraction[J].Computer Science,2012,39(4):236-239.
[19]DUWENAUD D K,MACLAURIN D,IPARRAGUIRRE J,et al.Convolutional networks on graphs for learning molecular fingerprints[C]//Advances in Neural Information Processing Systems 28:Annual Conference on Neural Information Proces-sing Systems 2015.computer science bibliography,2015:2224-2232.
[20]DANILO A,LUIGI C,GIAN L F,et al.A Shape Comparison Reinforcement Method Based on Feature Extractors and F1-Score[C]//Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems.ACM,2017:2594-2605.
[1] 张仁杰, 陈伟, 杭梦鑫, 吴礼发.
基于变分自编码器的不平衡样本异常流量检测
Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder
计算机科学, 2021, 48(7): 62-69. https://doi.org/10.11896/jsjkx.200600022
[2] 赵志强, 易秀双, 李婕, 王兴伟.
基于GR-AD-KNN算法的IPv6网络DoS入侵检测技术研究
Research on DoS Intrusion Detection Technology of IPv6 Network Based on GR-AD-KNN Algorithm
计算机科学, 2021, 48(6A): 524-528. https://doi.org/10.11896/jsjkx.200500001
[3] 黄铭, 孙林夫, 任春华, 吴奇石.
改进KNN的时间序列分析方法
Improved KNN Time Series Analysis Method
计算机科学, 2021, 48(6): 71-78. https://doi.org/10.11896/jsjkx.200500044
[4] 包宗铭, 龚声蓉, 钟珊, 燕然, 戴兴华.
基于双向KNN排序优化的行人再识别算法
Person Re-identification Algorithm Based on Bidirectional KNN Ranking Optimization
计算机科学, 2019, 46(11): 267-271. https://doi.org/10.11896/jsjkx.181001861
[5] 何佶星,陈汶滨,牟斌皓.
流行度划分结合平均偏好权重的协同过滤个性化推荐算法
Coordination Filtering Personalized Recommendation Algorithm Considering Average
Preference Weight and Popularity Division
计算机科学, 2018, 45(6A): 493-496.
[6] 李思瑶, 周海芳, 方民权.
基于GPU的图像监督分类算法的研究
Research of Image Classification Algorithm Based on GPU
计算机科学, 2018, 45(6A): 143-145.
[7] 陈静杰,车洁.
基于标准欧氏距离的燃油流量缺失数据填补算法
Fuel Flow Missing-value Imputation Method Based on Standardized Euclidean Distance
计算机科学, 2017, 44(Z6): 109-111. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.023
[8] 冯飞,姜宝华,刘培学,陈玉杰.
改进2DPCA算法在人脸识别中的应用
Application of Improved 2DPCA Algorithm in Face Recognition
计算机科学, 2017, 44(Z11): 267-268. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.056
[9] 薛忠斌,白利光,何宁,周烜,周歆,王珊.
路网中高吞吐量移动对象实时查询算法
Throughput Oriented Real-time Query Processing Algorithm for Moving Objects in Road Network
计算机科学, 2017, 44(3): 16-19. https://doi.org/10.11896/j.issn.1002-137X.2017.03.004
[10] 王立,王欣,马朝东.
一种基于本体KNN的分布式缓存数据交换策略
Distributed Caching Strategy for Data Exchange Program Based on Ontology and KNN Algorithm
计算机科学, 2016, 43(Z11): 316-319. https://doi.org/10.11896/j.issn.1002-137X.2016.11A.074
[11] 张丽平,经海东,李松,崔环宇.
障碍空间中基于Voronoi图的k最近邻查询
k Nearest Neighbor Query Based on Voronoi Diagram for Obstructed Spaces
计算机科学, 2016, 43(5): 174-178. https://doi.org/10.11896/j.issn.1002-137X.2016.05.032
[12] 华辉有,陈启买,刘海,张阳,袁沛权.
一种融合Kmeans和KNN的网络入侵检测算法
Hybrid Kmeans with KNN for Network Intrusion Detection Algorithm
计算机科学, 2016, 43(3): 158-162. https://doi.org/10.11896/j.issn.1002-137X.2016.03.030
[13] 王培重,郑南山,张言哲.
基于动态K值及AP MAC地址筛选的室内定位算法
Indoor Positioning Algorithm Based on Dynamic K Value and AP MAC Address Match
计算机科学, 2016, 43(1): 163-165. https://doi.org/10.11896/j.issn.1002-137X.2016.01.037
[14] 徐晓丹,姚明海,刘华文,郑忠龙.
基于kNN的多标签分类预处理方法
Pre-processing Method of Multi-label Classification Based on kNN
计算机科学, 2015, 42(5): 106-108. https://doi.org/10.11896/j.issn.1002-137X.2015.05.021
[15] 钱燕燕,李永忠,余西亚.
基于多标记与半监督学习的入侵检测方法研究
Intrusion Detection Method Based on Multi-label and Semi-supervised Learning
计算机科学, 2015, 42(2): 134-136. https://doi.org/10.11896/j.issn.1002-137X.2015.02.029
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!