计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 81-89.doi: 10.11896/jsjkx.230100037

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于主题声望和动态异构网络的学术影响力排序算法

陈潘1, 陈红梅2,3,4,5, 罗川6   

  1. 1 西南交通大学唐山研究院 河北 唐山063000
    2 西南交通大学计算机与人工智能学院 成都611756
    3 可持续城市交通智能化教育部工程研究中心 成都611756
    4 综合交通大数据应用技术国家工程实验室 成都611756
    5 四川省制造业产业链协同与信息化支撑技术重点实验室 成都611756
    6 四川大学计算机学院 成都610065
  • 收稿日期:2023-01-09 修回日期:2023-05-25 出版日期:2024-03-15 发布日期:2024-03-13
  • 通讯作者: 陈红梅(hmchen@swjtu.edu.cn)
  • 作者简介:(pchen@my.swjtu.edu.cn)
  • 基金资助:
    国家自然科学基金(61976182,62076171);四川省自然科学基金(2022NSFSC0898);四川省科技成果转移转化示范项目(2022ZHCG0005)

Academic Influence Ranking Algorithm Based on Topic Reputation and Dynamic HeterogeneousNetwork

CHEN Pan1, CHEN Hongmei2,3,4,5, LUO Chuan6   

  1. 1 Tangshan Research Institute,Southwest Jiaotong University,Tangshan,Hebei 063000,China
    2 School of Computing and Artificial Intelligence,Southwest Jiaotong University,Chengdu 611756,China
    3 Engineering Research Center of Sustainable Urban Intelligent Transportation,Ministry of Education,Chengdu 611756,China
    4 National Engineering Laboratory of Integrated Transportation Big Data Application Technology,Southwest Jiaotong University,Chengdu 611756,China
    5 Manufacturing Industry Chains Collaboration and Information Support Technology Key Laboratoryof Sichuan Province,Southwest Jiaotong University,Chengdu 611756,China
    6 College of Computer Science,Sichuan University,Chengdu 610065,China
  • Received:2023-01-09 Revised:2023-05-25 Online:2024-03-15 Published:2024-03-13
  • About author:CHEN Pan,born in 1997,postgraduate.His main research interests include data mining and academic big data.CHEN Hongmei,born in 1971,Ph.D,professor,Ph.D supervisor,is a member of CCF(No.19214M).Her main research interests include intelligent information processing,pattern recognition,etc.
  • Supported by:
    National Natural Science Foundation of China(61976182,62076171),Natural Science Foundation of Sichuan Province(2022NSFSC0898) and Sichuan Science and Technology Achievements Transfer and Transformation Demonstration Project(2022ZHCG0005).

摘要: 有效地挖掘学术大数据,分析论文的学术影响力,有助于科研工作者获取重要的信息。文本内容与学术网络结构的动态变化,会对论文的学术影响力排名结果产生重要的影响。但现有的论文学术影响力排序算法或是缺乏对文本内容的考虑,或是缺乏对学术网络结构的动态变化的考虑。针对该问题,提出了一种学术影响力排序算法,称之为基于主题声望和动态异构网络的学术影响力排名(TND-Rank)。TND-Rank衡量了论文主题在某一时间对论文的影响,并将其嵌入考虑时间因素的论文影响力排序算法中。TND-Rank通过考虑影响主题声望水平、期刊、作者、时间等多种因素的综合影响来计算论文的动态学术影响力相关排名。在实验中,对AMiner数据集1936-2014年间发表且信息保存完整的文章进行了分析,将所提算法与近年来的4种相关算法进行了比较,采用Spearman相关系数、归一化折损累积增益(NDCG)和分级平均精度(GAP)对算法性能进行了评估。实验结果验证了TND-Rank算法的可行性和有效性,其可以有效地综合各种信息对论文的学术影响力进行排序。

关键词: 异构网络, 学术影响力, 学术大数据, 主题声望, 论文排序

Abstract: Effectively mining academic big data and analyzing academic influence of papers are benefical for researchers to obtain important information.The dynamic changes of text content and academic network structure have an important impact on the ranking results of academic impact.However,the existing ranking algorithms of academic influence of papers either lack consideration of text contents or the dynamic changes of academic network structure.To solve this problem,this paper proposes an algorithm for ranking academic influence,which is called TND-Rank,based on topic reputation and dynamic heterogeneous network.In TND-Rank,the impact of the topic on the paper at a certain time is measured and embedded to the paper influence ranking algorithm that takes into account the time factor.The dynamic ranking related to the academic impact of a paper is calculated by comprehensively considering the influence of various factors,i.e,the level of topic prestige,journal,author,and time etc.In the experiments,the AMiner data set published between 1936 and 2014 with complete information are analyzed,and compared with four related algorithms in recent years.Spearman correlation coefficient,normalized discounted cumulative gain(NDCG) and graded average precision(GAP) are adopted to evaluate performance of the algorithm.Experimental results verify the feasibility and effectiveness of the proposed algorithm TND-Rank,which can effectively synthesize various information to rank the academic influence of papers.

Key words: Heterogeneous network, Academic influence, Academic big data, Thematic prestige, Thesis ranking

中图分类号: 

  • TP391
[1]WU Z,WU J,KHABSA M,et al.Towards building a scholarlybig data platform:Challenges,lessons and opportunities[C]//IEEE/ACM Joint Conference on Digital Libraries.IEEE,2014:117-126.
[2]GARFIELD E.Citation analysis as a tool injournal evaluation:Journals can be ranked by frequency and impact of citations for science policy studies[J].Science,1972,178(4060):471-479.
[3]BAI X,ZHANG F,HOU J,et al.Implicitmulti-feature learning for dynamic time series prediction of the impact of institutions[J].IEEE Access,2017,5:16372-16382.
[4]EGGHE L.An improvement of the h-index:The g-index[J].IS-SI newsletter,2006,2(1):8-9.
[5]HIRSCH J E.An index to quantify an individual's scientific research output[J].Proceedings of the National academy of Sciences,2005,102(46):16569-16572.
[6]SINGH P K.t-index:entropy based random document and citation analysis using average h-index[J].Scientometrics,2022,127(1):637-660.
[7]KAPTAY G.The k-index is introduced to replace the h-index to evaluate better the scientific excellence of individuals[J].Heli-yon,2020,6(7):e04415.
[8]FAT M O.Mo-Index for multi-authors papers[J].Annals of Library and Information Studies(ALIS),2022,69(4):323-326.
[9]KLEINBERG J M.Authoritative sources in a hyperlinked environment[J].Journal of the ACM(JACM),1999,46(5):604-632.
[10]PAGE L,BRIN S,MOTWANI R,et al.The pagerank citation ranking:Bring order to the web[R].Technical Report,Stanford University,1998.
[11]NIEROP E.The introduction of the 5-year impact factor:does it benefit statistics journals?[J].Statistica Neerlandica,2010,64(1):71-76.
[12]BARTNECK C,KOKKELMANS S.Detecting h-index manipulation through self-citation analysis[J].Scientometrics,2011,87(1):85-98.
[13]WANG Y,TONG Y,ZENG M.Ranking scientific articles by exploiting citations,authors,journals,and time information[C]//Twenty-seventh AAAI Conference on Artificial Intelligence.2013.
[14]AMJAD T,DING Y,DAUD A,et al.Topic-based heterogeneous rank[J].Scientometrics,2015,104:313-334.
[15]YU D,WANG W,ZHANG S,et al.A multiple-link,mutuallyreinforced journal-ranking model to measure the prestige of journals[J].Scientometrics,2017,111:521-542.
[16]BAI X,ZHANG F,NI J,et al.Measure the impact of institution and paper via institution-citation network[J].IEEE Access,2020,8:17548-17555.
[17]LU Y,MA K,DUAN J.Influence Model of Paper Citation Networks with Integrated PageRank and HITS[C]//2021 IEEE 24th International Conference on Computer Supported Cooperative Work in Design(CSCWD).IEEE,2021:1081-1086.
[18]JIANG X,GAO C,LIANG R.Ranking scientific articles in a dynamically evolving citation network[C]//2016 12th International Conference on Semantics,Knowledge and Grids(SKG).IEEE,2016:154-157.
[19]FRANCESCHET M,COLAVIZZA G.TimeRank:A dynamicapproach to rate scholars using citations[J].Journal of Informetrics,2017,11(4):1128-1141.
[20]RAKOCZY M E,BOUZEGHOUB A,GANCARSKI A L,et al.Time-Dependent Influence Measurement in Citation Networks[J].Complex Systems Informatics and Modeling Quarterly,2018(17):24-43.
[21]ZHOU J,LIU S,FENG L,et al.Weighted P-Rank:a Weighted Article Ranking Algorithm Based on a Heterogeneous Scholarly Network[C]//Neural Information Processing:28th Interna-tional Conference,ICONIP 2021,Sanur,Bali,Indonesia,December 8-12,2021,Proceedings,Part I.Cham:Springer International Publishing,2021:537-548.
[22]ZHANG F,WU S.Predicting future influence of papers,re-searchers,andvenues in a dynamic academic network[J].Journal of Informetrics,2020,14(2):101035.
[23]ZHOU X,LIANG W,KEVIN I,et al.Academic influence aware and multidimensional networkanalysis for research collaboration navigation based on scholarly big data[J].IEEE Transactions on Emerging Topics in Computing,2018,9(1):246-257.
[24]ZHOU Y,LI Q,YANG X,et al.Predicting the popularity ofscientific publications by an age-based diffusion model[J].Journal of Informetrics,2021,15(4):101177.
[25]ZHANG Y,WANG M,GOTTWALT F,et al.Ranking scientific articles based on bibliometric networks with a weighting scheme[J].Journal of Informetrics,2019,13(2):616-634.
[26]AMJAD T,DING Y,DAUD A,et al.Topic-based heteroge-neous rank[J].Scientometrics,2015,104:313-334.
[27]XU H,MARTIN E,MAHIDADIA A.Contents and time sensitive document ranking of scientific literature[J].Journal of Informetrics,2014,8(3):546-561.
[28]HUANG X,CHEN C,PENG C,et al.Topic-sensitive influentialpaper discovery in citation network[C]//Advances in Know-ledge Discovery and Data Mining:22nd Pacific-Asia Conference,PAKDD 2018,Melbourne,VIC,Australia,June 3-6,2018,Proceedings,Part II 22.Springer International Publishing,2018:16-28.
[29]WANG M,JIAO S,ZHANG J,et al.Identification high influen-tial articles by considering the topic characteristics of articles[J].IEEE Access,2020,8:107887-107899.
[30]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2003,3(Jan):993-1022.
[31]TANG J,ZHANG J,YAO L,et al.Arnetminer:extraction andmining of academic social networks[C]//Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2008:990-998.
[32]FREEMAN L C.Centrality in Social Networks Conceptual Cla-rification[J].Social Networks,1978,1(79):215-239.
[33]PIANTADOSI J,HOWLETT P,BOLAND J.Matching thegrade correlation coefficient using a copula with maximum disorder[J].Journal of Industrial and Management Optimization,2007,3(2):305-312.
[34]JÄRVELIN K,KEKÄLÄINEN J.Cumulated gain-based evaluation of IR techniques[J].ACM Transactions on Information Systems(TOIS),2002,20(4):422-446.
[35]ROBERTSON S E,KANOULAS E,YILMAZ E.Extending ave-rage precision to graded relevance judgments[C]//Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval.2010:603-610.
[36]ZHANG Y,MA J,WANG Z,et al.Collective topical PageRank:a model to evaluate the topic-dependent academic impact of scientific papers[J].Scientometrics,2018,114:1345-1372.
[37]TAO M,YANG X,GU G,et al.Paper recommend based onLDA and PageRank[C]//Artificial Intelligence and Security:6th International Conference(ICAIS 2020).Part III 6.Springer Singapore,2020:571-584.
[38]DAYEH M A,SARTAWI B,SALAH S.A Bias-Free Time-Aware PageRank Algorithm for Paper Ranking in Dynamic Citation Networks[J].Intelligent Information Management,2022,14(2):53-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!