Computer Science ›› 2018, Vol. 45 ›› Issue (11): 220-225.doi: 10.11896/j.issn.1002-137X.2018.11.034

• Artificial Intelligence • Previous Articles     Next Articles

Co-author and Affiliate Based Name Disambiguation Approach

SHANG Yu-ling1, CAO Jian-jun2, LI Hong-mei1, ZHENG Qi-bin1   

  1. (College of Command Information Systems,PLA University of Science and Technology,Nanjing 210007,China)1
    (The 63rd Research Institute,National University of Defense Technology,Nanjing 210007,China)2
  • Received:2017-10-25 Published:2019-02-25

Abstract: Name disambiguation is one of the most challenging issues in entity resolution domain,and it aims at solving the problem that the same name is shared by different people.However,most of the conventional approaches rely heavily on sufficient information of entities,and fail to realize the name identification with insufficient information.This paper proposesd a novel name disambiguation approach based on co-authors and authors’affiliates.Specifically,entity relationship diagram is constructed based on co-authorship and authors’affiliates,and the breadth-first search scheme is utilized to search the effective path between each pair of authors with the exactly same name in the constructed entity relationship diagram.A unique metric connection strength between authors is calculated according to the length of effective path,the number of effective path and the type of edge on path.And it is compared with the threshold to achieve name disambiguation.Experimental results show that the proposed approach is better than the state-of-the-art approaches,and it is able to disambiguate the authors sharing the same name without co-authorship.

Key words: Connection strength, Data quality, Effective path, Entity resolution, Name disambiguation

CLC Number: 

  • TP311
[1]TAN M C,DIAO X C,CAO J J,et al.Relationship Type Based Connection Strength Model for Relationship-based Entity Resolution[J].Journal of Computational Information Systems,2015,11(16):5947-5957.
[2]ANDERSON A F,VELOSO A,MARCOS A G,et al.Self-trai- ning Author Name Disambiguation for Information Scarce Scenarios[J].Journal of the American Society for Information Scien-ce & Technology,2014,65(6):1257-1278.
[3]EMILIA A S,ANDERSON A F,MARCOS A G.Combining Classifiers and User Feedback for Disambiguating Author Names[C]∥Proceedings of JCDL’16.Knoxville,Tennessee,USA,2015:259-260.
[4]COTA R G,ANDERSON A F,MARCOS A G,et al.An Unsupervised Heuristic-based Hierarchical Method for Name Disambiguation in Bibliographic Citations[J].Journal of the American Society for Information Science & Technology,2010,61(9):1853-1870.
[5]FAKHRI M,PHILIPP M.Using Co-authorship Networks for Author Name Disambiguation[C]∥2016 IEEE/ACM Joint Conference on Digital Libraries(JCDL).2016:261-262.
[6]CARVALHO A P,ANDERSON A F,ALBERTO H F,et al.Incremental Unsupervised Name Disambiguation in Cleaned Digi-tal Libraries[J].Journal of Information and Data Management,2011,2(3):289-304.
[7]FAN X M,WANG J Y,PU X,et al.On Graph-based Name Di- sambiguation[J].ACM Journal of Data and Information Quality,2011,2(2):1-23.
[8]MADIAN K,PUCKTADA T,LEE C G.Online Person Name Disambiguation with Constraints[C]∥ACM/IEEE-CS Joint Conference on Digital Libraries.2015:37-46.
[9]KIM K,KHABSA M,GILES C L.Inventor Name Disambigua- tion for a Patent Database Using a Random Forest and DBSCAN[C]∥Proceedings of the 16th ACM/IEE-CS on Joint Conference on Digital Libraries.2016:269-270.
[10]ZHENG C S,JI D,CAI D F.The Method of Expert Name Di- sambiguation Based on System Combination[J].Journal of Shen-yang Aerospace University,2014,31(2):74-78.(in Chinese)
郑才松,季铎,蔡东风.基于系统融合的专家同名区分方法[J].沈阳航空航天大学学报,2014,31(2):74-78.
[11]CHEN W L.Name Disambiguation Based on the Coauthorship Association Graph of Scholar Papers[D].Hangzhou:Hangzhou Dianzi University,2017.(in Chinese)
陈未路.基于科研论文合作者关系图的同名排歧方法研究[D].杭州:杭州电子科技大学,2017.
[12]THIAGO A G,RICARD S T,ARIADNE M B,et al.A Relevancd Feedback Approach for the Author Name Disambiguation Problem[C]∥Proceedings of ACM/IEEE Joint Conference on Digital Libraries’13 Indianapolis.Indiana,USA,2013:209-218.
[13]YIN X X,HAN J W,PHILIP S Y.Object Distinction:Distinguishing Objects with Identical Names[C]∥Proceedings ofInternational Conference on Data Engineering(ICDE).2007:1242-1246.
[14]XU R F,GUI L,LU Q,et al.Incorporating Multi-kernel function and Internet Verification for Chinese Person Name Disambiguation[J].Frontiers of Computer Science,2016,10(6):1-13.
[15]HIEN T N,TRU H C.Named Entity Disambiguation:A Hybird Statistical and Rule-Based Incremental Approach[C]∥Procee-dings of the Semantic Web:the 3th Asian Semantic Web Confe-rence(ASWC).2008:420-433.
[16]FU J L,QIU J,GUO Y L,et al.Entity Linking and Name Disam- biguation Using SVM in CHINESE Micro-blogs[C]∥Proceedings of International Conference on Natural Computation.IEEE,2016:468-472.
[17]LI Y P.Bibliometric Analysis and Name Disambiguation Research Based on Knowledge Clustering[D].Nanjing:Nanjing University of Posts and Telecommunications,2016.(in Chinese)
李永萍.基于知识聚类的文献统计与重名消歧机制的研究[D].南京:南京邮电大学,2016.
[18]MIN S,ERIN H K,HA J K.Exploring author name disambigua- tion on PubMed-scale[J].Journal of Informetrics,2015,9(4):924-941.
[19]MU L M.Research of the Nature & Operation of Finite MultiSet[J].Journal of Neijiang Normal University,2009,24(4):5-8.(in Chinese)
牟廉明.有限多重集的运算及性质[J].内江师范学院学报,2009,24(4):5-8.
[20]TRAVERS J,MILGRAM S.An Experimental Study of the Small World Problem[J].Sociometry,1969,32(4):425-443.
[21]MONGLI L,TOK W L,WAI L L.Intelliclean:A Knowledge-Based Intelligent Data Cleaner[C]∥ACM Sigkdd International Conference on Knowledge Discovery & Data Mining.2000:290-294.
[1] ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112.
[2] ZHAO Xue-lei, JI Xin-sheng, LIU Shu-xin, LI Ying-le, LI Hai-tao. Link Prediction Method for Directed Networks Based on Path Connection Strength [J]. Computer Science, 2022, 49(2): 216-222.
[3] ZHENG Xiao-meng, GAO Meng, TENG Jun-yuan. Research on Construction Method of Defect Prediction Dataset for Spacecraft Software [J]. Computer Science, 2021, 48(6A): 575-580.
[4] LI Zhuo, XU Zhe, CHEN Xin, LI Shu-qin. Location-related Online Multi-task Assignment Algorithm for Mobile Crowd Sensing [J]. Computer Science, 2019, 46(6): 102-106.
[5] XU Yao-li, LI Zhan-huai. Quality Control Agent Based on Probability Inference [J]. Computer Science, 2019, 46(4): 8-13.
[6] WANG Yang, CAI Shu-qin, ZOU Xin-wen, CHEN Zi-tong. Quality-embedded Hypergraph Model for Big Data Product Manufacturing System and Decision for Production Lines [J]. Computer Science, 2019, 46(2): 11-17.
[7] CAI Li, LIANG Yu, ZHU Yang-yong and HE Jing. History and Development Tendency of Data Quality [J]. Computer Science, 2018, 45(4): 1-10.
[8] YANG Dan, CHEN Mo, WANG Gang and SUN Liang-xu. Time-aware Query-time Entity Resolution and Data Fusion in Heterogeneous Information Spaces [J]. Computer Science, 2017, 44(3): 215-219.
[9] HUANG Dong-mei, ZHAO Dan-feng, WEI Li-fei, DU Yan-ling and WANG Zhen-hua. Managing Marine Data as Big Data:Uprising Challenges and Tentative Solutions [J]. Computer Science, 2016, 43(6): 17-23.
[10] ZHU Can and CAO Jian. Summary and Prospect on Entity Resolution [J]. Computer Science, 2015, 42(3): 8-12.
[11] TAN Ming-chao,DIAO Xing-chun and CAO Jian-jun. Survey on Entity Resolution [J]. Computer Science, 2014, 41(4): 9-12.
[12] HAN Jing-yu and CHEN Ke-jia. Ranking Data Quality of Web Article Content by Extracting Facts [J]. Computer Science, 2014, 41(11): 247-251.
[13] . Data Cleaning and its General System Framework [J]. Computer Science, 2012, 39(Z11): 207-211.
[14] . Realization of Data Cleaning Based on Editing Rules and Master Data [J]. Computer Science, 2012, 39(Z11): 174-176.
[15] XU Jun-gang,PEI Ying. Overview of Data Extraction, Transformation and Loading [J]. Computer Science, 2011, 38(4): 15-20.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!