计算机科学 ›› 2024, Vol. 51 ›› Issue (11): 95-102.doi: 10.11896/jsjkx.231000130

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于距离泛化的二分图(α,β)-core高效分解算法

张毅豪1, 华征宇1, 袁龙1, 张帆2, 王凯3, 陈紫4   

  1. 1 南京理工大学计算机科学与工程学院 南京 210094
    2 广州大学网络空间安全学院 广州 510555
    3 上海交通大学安泰经济与管理学院 上海 200030
    4 南京航空航天大学计算机科学与技术学院 南京 211106
  • 收稿日期:2023-10-20 修回日期:2024-03-11 出版日期:2024-11-15 发布日期:2024-11-06
  • 通讯作者: 袁龙(longyuan@njust.edu.cn)
  • 作者简介:(yhzhang@njust.edu.cn)
  • 基金资助:
    国家重点研发计划(2022YFF0712100);信息系统工程重点实验室项目(WDZC20205250411);国家自然科学基金青年科学基金(NSFC61902184)

Distance-generalized Based (α,β)-core Decomposition on Bipartite Graphs

ZHANG Yihao1, HUA Zhengyu1, YUAN Long1, ZHANG Fan2, WANG Kai3, CHEN Zi4   

  1. 1 School of Computer Science and Technology,Nanjing University of Science and Technology,Nanjing 210094,China
    2 School of Cybersecurity,Guangzhou University,Guangzhou 510555,China
    3 Antai College of Economics & Management,Shanghai Jiao Tong University,Shanghai 200030,China
    4 College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China
  • Received:2023-10-20 Revised:2024-03-11 Online:2024-11-15 Published:2024-11-06
  • About author:ZHANG Yihao,born in 1999,postgra-duate.His main research interests include graph data management and ana-lysis,and so on.
    YUAN Long,born in 1988,Ph.D,professor,is a senior member of CCF(No.B0267S).His main research interests include graph data management and analysis,and so on.
  • Supported by:
    National Key Research and Development Program of China(2022YFF0712100),Key Laboratory Program of Information Systems Engineering(WDZC20205250411) and Young Scientists Fund of the National Natural Science Foundation of China(NSFC61902184).

摘要: (α,β)-core分解作为图数据管理与分析研究中的热点问题,已经被广泛应用于电商欺诈检测和兴趣群组推荐等实际场景中。然而现有(α,β)-core模型在构建时仅考虑顶点距离为1的邻居,难以刻画出二部图社区中的细粒度信息。针对此问题,提出了基于距离泛化的(α,β,h)-core模型,即由二部图中两个不相交的顶点集构成一个最大子图,满足一个集合中的任何一个顶点至少有α个与它的距离不大于h的邻居顶点,另一个集合中的任何一个顶点至少有β个与它的距离不大于h的邻居顶点。通过引入距离为h的邻居,解决了(α,β)-core模型细粒度刻画能力不足的问题。由于新模型需要考虑距离不大于h的邻居,因此(α,β,h)-core分解变得更为困难。为此,提出了基于计算共享的分解策略,据此设计了高效的(α,β,h)-core分解算法,并分析了算法性能。考虑到确定距离不大于h的邻居顶点非常耗时,还提出一种(α,β,h)-core下界以减少重复计算距离不大于h的邻居顶点,进一步提高计算效率。在8个真实图数据上的对比实验结果验证了新模型的有效性和算法的高效性。

关键词: 二部图, (α,β,h)-core分解, 高效算法

Abstract: (α,β)-core decomposition is a fundamental problem in graph analysis,and has been widely adopted for e-commerce fraud detection and interest group recommendation.Nevertheless,(α,β)-core model only considers the distance-1 neighborhood,which makes it unable to provide more fine-grained structure information.Motivated by this,(α,β,h)-core model is proposed in this paper,which requires the vertices in one/another part has at least α other vertices at distance not greater than h within the subgraph.Due to the distance-h neighborhoods being considered,the new model can identify more fine-grained structure information as verified in our experiments,which also makes the corresponding (α,β,h)-core decomposition challenging.To address this problem,an efficient algorithm based on computation-sharing strategy is proposed and time complexity is analyzed accordingly.As obtaining neighbors within distance h is time-consuming,a lower bound related to (α,β,h)-core is designed to avoid unnecessary distance-h neighbors computation to further improve the computational efficiency.Experimental results on eight real graphs de-monstrate the effectiveness of the proposed model and efficiency of its algorithm.

Key words: Bipartitegraphs, (α, β, h)-core decomposition, Efficient algorithm

中图分类号: 

  • TP301
[1] LISMONT J,RAM S,VANTHIENEN J,et al.Predicting interpurch-ase time in a retail environment using customer-product networks:An empirical study and evaluation[J].Expert Systems with Applications,2018,104:22-32.
[2] KIM J,HASTAK M.Social network analysis:Characteristics of online social networks after a disaster[J].International Journal of Information Management,2018,38(1):86-96.
[3] YANG J,LI Y,LIU Q,et al.Brief introduction of medical data-base and data mining technology in big data era[J].Journal of Evidence-Based Medicine,2020,13(1):57-69.
[4] ZHOU F,YIN M M,JIAO C N,et al.Bipartite graph-based colla-borative matrix factorization method for predicting miRNA-dis-ease associations[J].BMC Bioinformatics,2021,22(1):1-16.
[5] XIA W,GAO Q,WANG Q,et al.Tensorized bipartite graph learning for multi-view clustering[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2022,45(4):5187-5202.
[6] WANG K,LIN X,QIN L,et al.Accelerated butterfly counting with vertex priority on bipartite graphs[J].The VLDB Journal,2023,32(2):257-281.
[7] CHEN C,ZHU Q,WU Y,et al.Efficient critical relationshipsidentification in bipartite networks[J].World Wide Web,2022,25(2):741-761.
[8] LIU B,YUAN L,LIN X,et al.Efficient(α,β)-core computation in bipartite graphs[J].The VLDB Journal,2020,29(5):1075-1099.
[9] BATAGELJ V,ZAVERSNIK M.An o(m) algorithm for coresdecomposition of networks[J].arXiv preprint cs/0310049,2003.
[10] KONG Y X,SHI G Y,WU R J,et al.k-core:Theories and ap-plications[J].Physics Reports,2019,832:1-32.
[11] WANG K,ZHANG W,LIN X,et al.Efficient and effective com-munity search on large-scale bipartite graphs[C]//2021 IEEE 37th International Conference on Data Engineering(ICDE).IEEE,2021:85-96.
[12] LIU B,YUAN L,LIN X,et al.Efficient(α,β)-core computation:An index-based approach[C]//The World Wide Web Conference.2019:1130-1141.
[13] MALLIAROS F D,GIATSIDIS C,PAPADOPOULOS A N,et al.The core decomposition of networks:Theory,algorithms and applications[J].The VLDB Journal,2020,29:61-92.
[14] CHAN T H H,SOZIO M,SUN B.Distributed approximate k-core decomposition and min-max edge orientation:Breaking the diameter barrier[J].Journal of Parallel and Distributed Compu-ting,2021,147:87-99.
[15] LIU J,XU C,YIN C,et al.K-core based temporal graph convo-lutional network for dynamic graphs[J].IEEE Transactions on Knowledge and Data Engineering,2020,34(8):3841-3853.
[16] DING D,LI H,HUANG Z,et al.Efficient fault-tolerant group recommendation using alpha-beta-core[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Ma-nagement.2017:2047-2050.
[17] ZHANG Y,PHILLIPS C A,ROGERS G L,et al.On finding bicliques in bipartite graphs:a novel algorithm and its application to the integration of diverse biological data types[J].BMC Bioinformatics,2014,15:1-18.
[18] ALLAHBAKHSH M,IGNJATOVIC A,BENATALLAH B,et al.Collusion detection in online rating systems[C]//Web Technologies and Applications:15th Asia-Pacific Web Confe-rence,APWeb 2013,Sydney Proceedings 15.Springer Berlin Heidelberg,2013:196-207.
[19] BEUTEL A,XU W,GURUSWAMI V,et al.Copycatch:stopping group attacks by spotting lockstep behavior in social networks[C]//Proceedings of the 22nd International Conference on World Wide Web.2013:119-130.
[20] BONCHI F,KHAN A,SEVERINI L.Distance-generalized core decomposition[C]//Proceedings of the 2019 International Conference on Management of Data.2019:1006-1023.
[21] TATTI N.Fast computation of distance-generalized cores using sampling[J].Knowledge and Information Systems,2023,65(6):2429-2453.
[22] CORMEN T H,LEISERSON C E,RIVEST R L,et al.Introduction to algorithms[M].MIT Press,2022.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!