计算机科学 ›› 2023, Vol. 50 ›› Issue (8): 16-26.doi: 10.11896/jsjkx.220600262

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于异构信息网络的最大影响力社区搜索

杜明, 杨雯, 周军锋   

  1. 东华大学计算机科学与技术学院 上海 201620
  • 收稿日期:2022-06-28 修回日期:2022-11-17 出版日期:2023-08-15 发布日期:2023-08-02
  • 通讯作者: 周军锋(zhoujf@dhu.edu.cn)
  • 作者简介:(duming@dhu.edu.cn)
  • 基金资助:
    上海自然科学基金(20ZR1402700);国家自然科学基金 (61472339,61873337)

Maximum Influential Community Search in Heterogeneous Information Network

DU Ming, YANG Wen, ZHOU Junfeng   

  1. School of Computer Science and Technology,Donghua University,Shanghai 201620,China
  • Received:2022-06-28 Revised:2022-11-17 Online:2023-08-15 Published:2023-08-02
  • About author:DU Ming,born in 1975,Ph.D,professor,is a member of China Computer Federation.His main research interests include natural language processing,information query and data analysis.
    ZHOU Junfeng,born in 1977,Ph.D,professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include information retrieval technology,semi structured data,query processing and optimization of graphic data.
  • Supported by:
    Natural Science Foundation of Shanghai,China(20ZR1402700) and National Natural Science Foundation of China(61472339,61873337).

摘要: 异构信息网络能够对对象类型多样、交互复杂的数据系统进行有效建模。基于异构信息网络进行社区搜索的研究,通常以顶点类型、最小度和网络结构为中心建立社区模型,查询内聚子图。但现有研究存在两个问题:1)未考虑网络中隐藏的另一种自然属性——影响力对查询结果的影响;2)忽略了用户对查询结果规模上限的要求,导致查询结果与用户预期不匹配。为此,对与影响力信息相结合的异构信息网络进行研究,并提出组合约束模型作为该类网络的社区内聚性度量标准。为解决基于组合约束模型的社区搜索问题,提出了两种通过预处理和剪枝策略进行优化的搜索算法。最后,在8个数据集上对所提算法的有效性和高效性进行验证。

关键词: 异构信息网络, 社区搜索, 社区内聚模型, 影响力

Abstract: Heterogeneous information network can effectively model data systems,which have diverse object types and complex interactions.Research on community search based on heterogeneous information networks usually builds community models centered on vertex type,minimum degree and network structure,then the cohesive subgraph is queried.However,there are two pro-blems in the existing researches:1)the influence value,another natural attribute hidden in networks is not considered;2)the user'srequirement for the upper limit of the query result scale is ignored too,resulting in the query result do not match user's expectation.Therefore,this paper studies the heterogeneous information networks combined with influence value,and proposes a combined constraint model as a measure of community cohesion for such networks.To solve the community search problems based on the combined constraint model,this paper proposes two search algorithms optimized by preprocessing and pruning strategies.Finally,the effectiveness and efficiency of our method are verified on 8 real data sets.

Key words: Heterogeneous information network, Community search, Community cohesive model, Influence value

中图分类号: 

  • TP311
[1]SHI C,LI Y,ZHANG J,et al.A Survey of Heterogeneous Information Network Analysis[J].IEEE Transactions on Know-ledge and Data Engineering,2016,29(1):17-37.
[2]SUN Y,HAN J.Mining heterogeneous information networks:a structural analysis approach[J].ACM SIGKDD Explorations Newsletter,2013,14(2):20-28.
[3]FANG Y,WANG K,LIN X,et al.Cohesive Subgraph Searchover Big Heterogeneous Information Networks:Applications,Challenges,and Solutions[C]//SIGMOD Conference.2021:2829-2838.
[4]ZHOU A,WANG Y,CHEN L.Finding Large Diverse Communities on Networks:The Edge Maximum k*-Partite Clique[J].Proceedings of the VLDB Endowment(PVLDB),2020,3(11):2576-2589.
[5]ZHU R,ZOU Z,LI J.Diversified Coherent Core Search onMulti-Layer Graphs[C]//2018 IEEE 34th International Confe-rence on Data Engineering (ICDE).IEEE,2018:701-712.
[6]TORE G,IRNICH S,ZIMMERMANN H,et al.Finding all k-cliques in k-partite graphs,an application in textile engineering[J].Computers & Operations Research,2002,29(1):13-31.
[7]DOURISBOURE Y,GERACI F,PELLEGRIMI M.Extraction and classification of dense implicit communities in the Web graph[J].ACM Transactions on the Web,2009,3(2),7:1-7:36.
[8]PALLA G,DERANVI I,FARKAS I,et al.Uncovering theoverlapping community structure of complex networks in nature and society[J].Nature,2005,435(7043):814.
[9]FANG Y,YANG Y,ZHANG W,et al.Effective and efficientcommunity search over large heterogeneous information networks[J].Proceedings of the VLDB Endowment,2020,13(6):854-867.
[10]YANG Y,FANG Y,LIN X,et al.Effective and Efficient Truss Computation over Large Heterogeneous Information Networks[C]//2020 IEEE 36th International Conference on Data Engineering(ICDE).IEEE,2020:901-912.
[11]JIAN X,WANG Y,CHEN L.Effective and efficient relational community detection and search in large dynamic heterogeneous information networks[J].Proceedings of the VLDB Endowment(PVLDB),2020,13(10):1723-1736.
[12]WU Y,ZHAO J,SUN R,et al.Efficient Personalized Influential Community Search in Large Networks[J].Data Science and Engineering,2021,6(3):310-322.
[13]LIU Z,XIANG B,GUO W,et al.Overlapping Community De-tection Algorithm Based on Coarsening and Local Overlapping Modularity[J].IEEE Access,2019,7:57943-57955.
[14]LI R H,LU Q,XU J,et al.Influential Community Search inLarge Networks[J].Proceedings of the VLDB Endowment (PVLDB),2015,8(5):509-520.
[15]GANESH P,DINGLIWAL S,AGARVAL R.Literature Survey on Finding Influential Communities in Large Scale Networks[J].arXiv:1902.01629,2019.
[16]SOZIO M,GIONIS A.The community-search problem and how to plan a successful cocktail party[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2010:939-948.
[17]YAO K,CAHNG L.Efficient size-bounded community searchover large networks[J].Proceedings of the VLDB Endowment(PVLDB),2021,14(8):1441-1453.
[18]FANG Y,YANG Y,ZHANG W,et al.Effectiveand efficientcommunity search over large heterogeneous information networks[C]//Proceedings of the VLDB Endowment.2020:854-867.
[19]YANG Y,FANG Y,LIN X,et al.Effective and Efficient Truss Computation over Large Heterogeneous Information Networks[C]//2020 IEEE 36th International Conference on Data Engineering (ICDE).IEEE,2020:901-912.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!