计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 90-101.doi: 10.11896/jsjkx.221200029

• 数据库&大数据&数据科学 • 上一篇    下一篇

异质信息网络中基于解耦图神经网络的社区搜索

陈伟, 周丽华, 王亚峰, 王丽珍, 陈红梅   

  1. 云南大学信息学院 昆明650500
  • 收稿日期:2022-12-05 修回日期:2023-05-16 出版日期:2024-03-15 发布日期:2024-03-13
  • 通讯作者: 周丽华(lhzhou@ynu.edu.cn)
  • 作者简介:(1121137902@qq.com)
  • 基金资助:
    国家自然科学基金(62062066,61762090,61966036,62276227);云南省基础研究计划重点项目(202201AS070015);云南省智能系统与计算重点实验室项目(202205AG070003);云南省教育厅区块链与数据安全治理工程研究中心项目;云南省物联网技术与应用大学重点实验室项目

Community Search Based on Disentangled Graph Neural Network in Heterogeneous Information Networks

CHEN Wei, ZHOU Lihua, WANG Yafeng, WANG Lizhen, CHEN Hongmei   

  1. School of Information Science and Engineering,Yunnan University,Kunming 650500,China
  • Received:2022-12-05 Revised:2023-05-16 Online:2024-03-15 Published:2024-03-13
  • About author:CHEN Wei,born in 1999,postgraduate.His main research interests include data mining and social network analysis.ZHOU Lihua,born in 1968,Ph.D,professor,Ph.D supervisor.Her main research interests include data mining,social network analysis,and artificial intelligence.
  • Supported by:
    National Natural Science Foundation of China(62062066,61762090,61966036,62276227), Yunnan Fundamental Research Projects (202201AS070015),Yunnan Key Laboratory of Intelligent Systems and Computing Project(202205AG070003),Blockchain and Data Security Governance Engineering Research Center Project of Yunnan Provincial Department of Education and University Key Laboratory of Internet of Things Technology and Application Project of Yunnan Province.

摘要: 在异质信息网络(HINs)中搜索包含给定查询节点的社区具有广泛的应用价值,如好友推荐、疫情监控等。现有HINs社区搜索方法大多基于预定义的子图模式对社区的拓扑结构施加一个严格的要求,忽略了节点间的属性相似性,导致结构关系弱而属性相似性高的社区难以定位,并且采用的全局搜索模式难以有效处理大规模的网络数据。为解决这些问题,首先设计解耦图神经网络和基于元路径的局部模块度,分别用于度量节点间的属性相似性和结构内聚性,并利用0/1背包问题优化属性和结构两种凝聚性度量指标,定义了最有价值的c大小社区搜索问题,进而提出了一种基于解耦图神经网络的价值最大化社区搜索模型,执行3个阶段的搜索过程。第一阶段,依据查询信息与元路径,构造候选子图,将搜索范围控制在查询节点的局部范围内,保证整个模型的搜索效率;第二阶段,利用解耦图神经网络,融合异质图信息和用户标签信息,计算节点间的属性相似度;第三阶段,根据社区定义以及凝聚性度量指标,设计贪心算法查找属性相似度高且结构凝聚的c大小社区。最后,在真实的同质和异质网络数据集上测试了搜索模型的性能,大量实验结果验证了模型的有效性和高效性。

关键词: 异质信息网络, 社区搜索, 解耦图神经网络, 元路径, 局部模块度

Abstract: Searching the community containing a given query node in heterogeneous information networks(HINs) has a wide range of application values,such as friend recommendation,epidemic monitoring and so on.However,most of the existing HINs community search methods impose strict requirements on the topology of the community based on the predefined subgraph pattern,ignoring the attribute similarity between nodes,which will be difficult to locate the community with weak structural relationship and high attribute similarity.And the global search mode is difficult to effectively deal with large-scale network data.To solve these problems,we design disentangled graph neural network and the local modularity based on meta path to measure the attribute similarity and structural cohesion between nodes respectively.Moreover,we use the 0/1 knapsack problem to optimize the impact of the attribute and structure on the community,define the most valuable c-size community search problem,and then propose a value maximization community search algorithm based on disentangled graph neural network to perform a three-stage search process.In the first stage,we construct candidate subgraphs according to the query in-formation and meta-path,control the search range within the local range of the query vertex to ensure the search efficiency of the whole algorithm.In the second stage,we use the disentangled graph neural network to fuse the heterogeneous information and user label information to calculate the attribute similarity between nodes.In the third stage,we design a greedy algorithm to find the c-size community with high attribute similarity and structural cohesion according to the community definition and cohesion measurement indicator.Finally,we test the performance of algorithm on real homogeneous and heterogeneous data sets,and a large number of experimental results demonstrate the effectiveness and efficiency of the proposed model.

Key words: Heterogeneous information networks, Community search, Disentangled graph neural network, Meta-paths, Local mo-dularity

中图分类号: 

  • TP301
[1]QIAO L,ZHANG Z,YUAN Y,et al.Keyword-Centric Community Search over Large Heterogeneous Information Networks[M]//Lecture Notes in Computer Science:12681 LNCS.2021:158-173.
[2]FANG Y,YANG Y,ZHANG W,et al.Effective and efficientcommunity search over large heterogeneous information networks[J].Proceedings of the VLDB Endowment,2020,13(6):854-867.
[3]ZHU J C,WANG C K.Approaches to community search under complex conditions[J].Journal of Software,2019,30(3):21.
[4]YANG Y,FANG Y,LIN X,et al.Effective and Efficient Truss Computation over Large Heterogeneous Information Networks[C]//2020 IEEE 36th International Conference on Data Engineering(ICDE).IEEE,2020:901-912.
[5]GAO J,CHEN J,LI Z,et al.ICS-GNN:Lightweight Interactive Community Search via Graph Neural Network[J].Proceedings of the VLDB Endowment,2021,14(6):1006-1018.
[6]CLAUSET A.Finding local community structure in networks[J].Physical Review E,2005,72(2):026132.
[7]FANG Y,WANG K,LIN X,et al.Cohesive Subgraph Searchover Big Heterogeneous Information Networks:Applications,Challenges,and Solutions[C]//Proceedings of the 2021 International Conference on Management of Data.New York,NY,USA:ACM,2021:2829-2838.
[8]MA J,CUI P,KUANG K,et al.Disentangled graph convolutional networks[C]//36th International Conference on Machine Learning.ICML,2019:7454-7463.
[9]ZHOU L H,WANG J L,WANG L Z,et al.Heterogeneous Information Network Representation Learning:A Survey[J].Chinese Journal of Computers,2022,45(1):160-189.
[10]LIU B,ZHANG F,ZHANG W,et al.Efficient CommunitySearch with Size Constraint[C]//2021 IEEE 37th International Conference on Data Engineering(ICDE).IEEE,2021:97-108.
[11]HE Y Z,WANG X Z,LI W B,et al.Research on Genetic Algorithms for the Discounted {0-1} Knapsack Problem[J].Chinese Journal of Computers,2022,39(12):2614-2630.
[12]REN Y,LIU B,HUANG C,et al.Heterogeneous Deep GraphInfomax[C]//8th International Conference on Learning Representations.New Orleans:ICLR,2020.
[13]SHI C,WANG R J,WANG X.A Survey of Heterogeneous Information Networks Analysis and Applications[J].Journal of Software,2022,33(2):598-621.
[14]VELIČKOVIĆ P,FEDUS W,HAMILTON W L,et al.Deep Graph Infomax[C]//7th International Conference on Learning Representations.New Orleans:ICLR,2019.
[15]JIANG Y,RONG Y,CHENG H,et al.Query-Driven GraphConvolutional Networks for Attributed Community Search[C]//48th International Conference on Very Large Databases.Sydney:PVLDB,2022.
[16]JIAN X,WANG Y,CHEN L.Effective and efficient relational community detection and search in large dynamic heterogeneous information networks[J].Proceedings of the VLDB Endowment,2020,13(10):1723-1736.
[17]YANG J,LESKOVEC J.Defining and evaluating network communities based on ground-truth[J].Knowledge and Information Systems,2015,42(1):181-213.
[18]FORTUNATO S.Community detection in graphs[J].PhysicsReports,2009,486(3/4/5):27-38.
[19]KIPF T N,WELLING M.Semi-Supervised Classification withGraph Convolutional Networks[C]//5th International Confe-rence on Learning Representations.Toulon,France:ICLR 2017.
[20]HAMILTON W L,YING R,LESKOVEC J.Inductive Representation Learning on Large Graphs[J].Advances in Neural Information Processing Systems.2017,30(1):1025-1035.
[21]DU J,ZHANG S,WU G,et al.Topology Adaptive Graph Con-volutional Networks[J].arXiv.1710.10370,2017.
[22]CHIANG W L,LIU X,SI S,et al.Cluster-GCN:An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.New York,NY,USA:ACM,2019:257-266.
[23]FANG Y,HUANG X,QIN L,et al.A Survey of Community Search Over Big Graphs[J].The VLDB Journal,2020,29(1):353-392.
[24]CHEN Y Q,QIAN T Y,LI W L,et al.Exploiting Composite Relation Graph Convolution for Attributed Network Embedding[J].Journal of Computer Research and Development,2020,57(8):1674-1682.
[25]LI S S,GUO J F,ZHENG C,et al.Attraction Classification Based on User Reviews and Heterogeneous Graph Neural Network[J].Journal of Chinese Computer Systems,2023,44(9):1884-1891.
[26]ZHAO Y H,HAN L W.Heterogeneous Networks Overlapping Community Discovery Algorithm Based on Network Embedding[J].Journal of Chinese Computer Systems,2021,42(12):2660-2665.
[27]ZHOU Y,FANG Y,LUO W,et al.Influential CommunitySearch over Large Heterogeneous Information Networks[J].Proceedings of the VLDB Endowment,2023,16(8):2047-2060.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!