计算机科学 ›› 2020, Vol. 47 ›› Issue (9): 88-93.doi: 10.11896/jsjkx.190800122

• 数据库&大数据&数据科学 • 上一篇    下一篇

历史图上基于CSR结构的PageRank算法

潘培贤, 邹兆年, 李发明   

  1. 哈尔滨工业大学计算机科学与技术学院 哈尔滨150001
  • 收稿日期:2019-08-26 发布日期:2020-09-10
  • 通讯作者: 邹兆年(znzou@hit.edu.cn)
  • 作者简介:pan_rose@yeah.net
  • 基金资助:
    国家自然科学基金面上项目(61672189);国家自然科学基金重点项目(61532015)

CSR-based PageRank Algorithm on Historical Graphs

PAN Pei-xian, ZOU Zhao-nian, LI Fa-ming   

  1. School of Computer Science and Technology,Harbin Institute of Technology,Harbin 150001,China
  • Received:2019-08-26 Published:2020-09-10
  • About author:PAN Pei-xian,born in 1996,master candidate.His main research interests include graph data mining and so on.
    ZOU Zhao-nian,born in 1979,Ph.D,associate professor,Ph.D supervisor,is a member of China Computer Federation.His main research interests include big data,graph databases and data mining.
  • Supported by:
    Surface Project of National Natural Science Foundation of China (61672189) and Key Program of National Natural Science Foundation of China (61532015).

摘要: 近年来,学者们对静态图的研究越来越全面、深入,已经形成了完善的理论体系。但是,对于生活中的一些应用问题,如社交网络中不断变化的关系等,使用静态图表示此类动态变化的关系似乎显得有些乏力。而历史图可以表示动态的变化。PageRank算法是用于衡量网页重要程度的算法,而网络中不断有网站新建或删除,这样的网络用历史图来表示更为合适,因此考虑在历史图上利用CSR(Compressed Sparse Row)结构实现PageRank,使得程序能够给出几个目标时间上各网站的评分,进而能够提供网站评分的变化情况,给出网站影响力趋势的预测。在Wekipedia提供的网页互相连接的Hyperlink networks数据集上,将所提方法与在链表上实现PageRank算法做比较,结果显示其性能大大优于使用链表的结构,并且随着数据规模和目标时间规模的增大,其优势将会越来越明显。

关键词: CSR结构, PageRank, 历史图

Abstract: In recent years,the research on static graphs has become more and more comprehensive and in-depth,and a perfect theoretical system has been formed.However,for some application issues in life,such as the changing relationships in social networks,it seems a bit weak to use static graphs to show that such moments are changing.Historical graph can be used to represent dynamic changes.The PageRank algorithm is an algorithm for measuring the importance of a web page,and there are constantly new or deleted websites in the network.Considering all these factors,such a network is quite appropriate to be represented by historical graphs.Therefore,this paper considers using the CSR (Compressed Sparse Row) structure to implement PageRank on the history graph,so that the program can give the scores of each website at several target times.In turn,it can provide changes in website ratings and give forecasts of website influence trends.Comparing its performance with the PageRank algorithm implemented on the linked list based on the hyperlinks network dataset provided by Wekipedia,the results show that its performance is much better than the use of linked list structure,and its advantages will become more and more obvious as the data size and target time scale increase.

Key words: Compressed Sparse Row structure, Historical graph, PageRank

中图分类号: 

  • TP311
[1] WU J P,LIN S,XU K,et al.Advances in Evolvable New Generation Internet Architecture[J].Chinese Journal of Computer,2012,35(6):1094-1108.
[2] PRZYTYCKA T M,SINGH M,SLONIM D K.Toward the dynamic interactome:It's about time[J].Briefings in Bioinforma-tics,2010,11(1):15-29.
[3] CAO J.Analysis of Google’s PageRank technology[J].Journal of Information,2002(10):15-18.
[4] CHANG J W,DAI D H.Personalized Recommendation Algorithm Based on PageRank and Spectral Method[J].Computer Science,2018,45(S2):398-401.
[5] LI C S,LIU X G,JIAO H T,et al.An Improved PageRank Algorithm Based on APP Search System[J].Computer and Mo-dernization,2018(7):24-27,38.
[6] PEDROCHE F,GARCÍA E,ROMANCE M,et al.On the spectrum of two-layer approach and Multiplex PageRank[J].Journal of Computational and Applied Mathematics,2018,344:161-172.
[7] HARARY F,GUPTA G.Dynamic graph models[J].Mathematical and Computer Modelling,1997,25(7):79-87.
[8] DING L L,LI Z D,JI W T,et al.Reachability Query of Large Scale Dynamic Graph Based on Improved Huffman Coding[J].Acta Electronica Sinica,2017,45(2):359-367.
[9] WANG C,CHEN L.Continuous Subgraph Pattern Search over Graph Streams[C]//International Conference on Data Engineering.IEEE,2009.
[10] FAN W,LI J,LUO J,et al.Incremental graph pattern matching[C]//Acm Sigmod International Conference on Management of Data.ACM,2011.
[11] CHOUDHURY S,HOLDER L,CHIN G,et al.Proceedings of.
the Workshop on Dynamic Networks Management and Mining[C]//International Conference on Management of Data(SIGMOD/PODS’13).2013.
[12] WU L K,XIAO X K,DENG D X,et al.Shortest Path and Distance Queries on Road Networks:An Experimental Evaluation[J].Proceeding of the VLDB Endowment,2012,5(5):406-417.
[13] PAGE L,BRIN S,MOTWANI R,et al.The PageRank Citation Ranking:Bringing Order to the Web[J].Stanford Digital Libraries Working Paper,1998,9(1):1-14.
[14] WEI Z W,HE X D,X X K,et al.TopPPR:Top-k Personalized PageRank Queries with Precision Guarantees on Large Graphs[C]//International Conference on Management of Data.SIGMOD,2018.
[15] HAVELIWALATH.Topic-sensitive pagerank:A context-sensitive ranking algorithm for web search[J].IEEE Transactions on Knowledge and Data Engineering,2003,15(4):784-796.
[16] ROUL R K,SAHOO J K,ARORA K.Query-Optimized Page-Rank:A Novel Approach[M]//Computational Intelligence in Data Mining.Advances in Intelligent Systems and Computing.Singapore:Springer,2019:673-683.
[17] LI C C,KANG Z J,YU H G,et al.Identification Method of Key Nodes in Power System Based on Improved PageRank Algorithm[J].Transactions of China Electrotechnical Society,2019,34(9):168-175.
[18] LIU Z Y,LI Q F,CENG C,et al.An Optimization Method of PageRank Based on Spark and its Application Research[J].Journal of CAEIT,2018(4):399-405.
[1] 王新胜,马树章.
融合用户自身因素与互动行为的微博用户影响力计算方法
Method of Weibo User Influence Calculation Integrating Users’ Own Factors and Interaction Behavior
计算机科学, 2020, 47(1): 96-101. https://doi.org/10.11896/jsjkx.181202253
[2] 刘小捷, 吕晓强, 王晓玲, 张伟, 赵安.
基于维基百科类别图的推特用户兴趣挖掘
Mining User Interests on Twitter Using Wikipedia Category Graph
计算机科学, 2019, 46(9): 79-84. https://doi.org/10.11896/j.issn.1002-137X.2019.09.010
[3] 王嵘冰,安维凯,冯勇,徐红艳.
基于标签和PageRank的重要微博用户推荐算法
Important Micro-blog User Recommendation Algorithm Based on Label and PageRank
计算机科学, 2018, 45(2): 276-279. https://doi.org/10.11896/j.issn.1002-137X.2018.02.047
[4] 常家伟, 戴牡红.
基于PageRank和谱方法的个性化推荐算法
Personalized Recommendation Algorithm Based on PageRank and Spectral Method
计算机科学, 2018, 45(11A): 398-401.
[5] 齐玉东,何诚,袁伟.
基于PageRank的网站服务质量影响因素的重要性排序算法
Algorithm of Importance Ranking for Influencing Factors of Website Service Quality Based on PageRank
计算机科学, 2017, 44(Z11): 80-83. https://doi.org/10.11896/j.issn.1002-137X.2017.11A.015
[6] 薛竹君,杨树强,束阳雪.
基于实体关系网络的微博文本摘要
Microblog Text Summarization Based on Entity Relation Network
计算机科学, 2016, 43(9): 77-81. https://doi.org/10.11896/j.issn.1002-137X.2016.09.014
[7] 徐文涛,刘锋,朱二周.
基于MapReduce的新型微博用户影响力排名算法研究
Research on Novel Ranking Algorithm of Microblog User’s Influence Based on MapReduce
计算机科学, 2016, 43(9): 66-70. https://doi.org/10.11896/j.issn.1002-137X.2016.09.012
[8] 朱明峰,叶施仁,叶仁明.
基于Lex-PageRank的微博摘要优化方法
Extract Summarization Method Based on Lex-PageRank in Chinese Microblog
计算机科学, 2016, 43(9): 261-265. https://doi.org/10.11896/j.issn.1002-137X.2016.09.052
[9] 王冲,纪仙慧.
基于用户兴趣与主题相关的PageRank算法改进研究
Improved PageRank Algorithm Based on User Interest and Topic
计算机科学, 2016, 43(3): 275-278. https://doi.org/10.11896/j.issn.1002-137X.2016.03.051
[10] 白亮,于天元,刘湜,老松杨,杨征.
基于改进谱聚类方法的搜索引擎排序算法
Ranking Algorithm of Search Engine Using Improved Spectral Clustering
计算机科学, 2016, 43(10): 220-224. https://doi.org/10.11896/j.issn.1002-137X.2016.10.042
[11] 杨竣辉,刘宗田,刘 炜,苏小英.
基于文本事件网络自动摘要的抽取方法
Extraction Method of Text Summarization Based on Event Network
计算机科学, 2015, 42(3): 210-213. https://doi.org/10.11896/j.issn.1002-137X.2015.03.043
[12] 喻依,甘若迅,樊锁海,刘庆,邵晴.
基于PageRank算法和HITS算法的期刊评价研究
Journal Evaluation Based on PageRank Algorithm and HITS Algorithm
计算机科学, 2014, 41(Z6): 110-113.
[13] 周咏梅,阳爱民,杨佳能.
一种新闻评论情感词典的构建方法
Construction Method of Sentiment Lexicon for News Reviews
计算机科学, 2014, 41(8): 67-69. https://doi.org/10.11896/j.issn.1002-137X.2014.08.014
[14] 曹姗姗,王冲.
基于网页链接与用户反馈的PageRank算法改进研究
Improved PageRank Algorithm Based on Links and User Feedback
计算机科学, 2014, 41(12): 179-182. https://doi.org/10.11896/j.issn.1002-137X.2014.12.039
[15] 宫秀文,张佩云.
基于PageRank的社交网络影响最大化传播模型与算法研究
Research on Propagation Model and Algorithm for Influence Maximization in Social Network Based on PageRank
计算机科学, 2013, 40(Z6): 136-140.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!