计算机科学 ›› 2021, Vol. 48 ›› Issue (5): 124-129.doi: 10.11896/jsjkx.200500058

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于随机投影和主成分分析的网络嵌入后处理算法

胡昕彤, 沙朝锋, 刘艳君   

  1. 复旦大学计算机科学技术学院 上海200433
  • 收稿日期:2020-05-14 修回日期:2020-07-06 出版日期:2021-05-15 发布日期:2021-05-09
  • 通讯作者: 沙朝锋(cfsha@fudan.edu.cn)
  • 基金资助:
    国家重点研发计划(2018YFB0904503)

Post-processing Network Embedding Algorithm with Random Projection and Principal Component Analysis

HU Xin-tong, SHA Chao-feng, LIU Yan-jun   

  1. School of Computer Science,Fudan University,Shanghai 200433,China
  • Received:2020-05-14 Revised:2020-07-06 Online:2021-05-15 Published:2021-05-09
  • About author:HU Xin-tong,born in 1996,postgra-duate.Her main research interests include natural language processing and software engineering.(18212010008@fudan.edu.cn)
    SHA Chao-feng,born in 1976,Ph.D,associate professor.His main research interests include machine learning and data mining,and natural language processing.
  • Supported by:
    National Key Research and Development Program of China (2018YFB0904503).

摘要: 网络嵌入作为网络表示学习,近年来受到了研究人员的广泛关注。目前,已有许多基于网络结构学习网络中结点的低维向量表示的模型,如DeepWalk等,并且这些模型在结点分类和链接预测等任务中取得了良好的效果。然而,随着网络规模的增大,多个网络嵌入算法存在计算瓶颈问题。为缓解该问题,可采用诸如随机投影这类无需学习的方法,但这样可能会丢失网络结构的关键信息,致使算法性能下降。为此,文中提出了一种网络嵌入的后处理算法PPNE(Post-Processing Network Embedding),该算法结合了随机投影以及主成分分析,有效地保留了网络结构的关键信息,保持了网络结构的高阶近似性。将所提算法与其他网络嵌入算法在3个公共数据集上针对结点分类和链接预测任务进行实验对比,以验证其有效性。实验结果表明,PPNE算法在运行速度和预测性能方面相比其他算法有较大的提升,尤其是该算法在保证良好任务效果的同时,运行速度比其他基于学习的算法提升了至少两个数量级。

关键词: 结点分类, 链接预测, 随机投影, 网络嵌入, 主成分分析

Abstract: Network embedding as network representation learning has received a lot of attention from researchers in recent years.A number of models based on low-dimensional vector representation of nodes in network structure learning networks,such as DeepWalk,have been developed with good results in tasks such as node classification and link prediction.However,with the network size increases,there are computational bottlenecks with multiple network embedding algorithms.To mitigate this problem,no-learning methods such as random projection can be used,but critical information about the network structure may be lost,resulting in degraded algorithm performance.In this paper,a post-processing algorithm for network embedding(PPNE) is proposed,which uses random projection as well as principal component analysis to effectively retain key information and maintain a higher order approximation of the network structure.Experiments are conducted on three public datasets for node classification and link prediction tasks,while the performance of the PPNE algorithm is verified against other network embedding algorithms.The experimental results show that the PPNE algorithm has a large improvement over other algorithms in terms of both perfor-mance and running time,and the algorithm has a speed improvement of at least two orders of magnitude over other learning-based algorithms while ensuring good task performance.

Key words: Link prediction, Network embedding, Node classification, Principal component analysis, Random projection

中图分类号: 

  • TP391
[1]CUI P,WANG X,PEI J,et al.A survey on network embedding[J].IEEE Transactions on Knowledge and Data Engineering,2018,31(5):833-852.
[2]SEN P,NAMATA G,BILGIC M,et al.Collective classification in network data[J].AI magazine,2008,29(3):93.
[3]LI BEN-NOWELL D,KLEINBERG J.The link-prediction pro-blem for social networks[J].Journal of the American society for information science and technology,2007,58(7):1019-1031.
[4]TU C,LIU Z,SUN M.Inferring correspondences from multiple sources for microblog user tags[C]//Chinese National Confe-rence on Social Media Processing.Springer,Berlin,Heidelberg,2014:1-12.
[5]HAN Z M,ZHENG C Y,DUAN D G,et al.Associated Users Mining Algorithm Based on Multi-information Fusion Representation Learning[J].Computer Science,2019,46(4):77-82.
[6]KOREN Y,BELL R,VOLINSKY C.Matrix factorization techniques for recommender systems[J].Computer,2009(8):30-37.
[7]PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:Online lear-ning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.ACM,2014:701-710.
[8]WANG D,CUI P,ZHU W.Structural deep network embedding[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:1225-1234.
[9]ZHANG Z,CUI P,LI H,et al.Billion-scale network embedding with iterative random projection[C]//2018 IEEE International Conference on Data Mining(ICDM).IEEE,2018:787-796.
[10]CAO S,LU W,XU Q.Grarep:Learning graph representations with global structural information[C]//Proceedings of the 24th ACM International on Conference on Information and Know-ledge Management.ACM,2015:891-900.
[11]TANG J,QU M,WANG M,et al.Line:Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2015:1067-1077.
[12]GROVER A,LESKOVEC J.node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mi-ning.ACM,2016:855-864.
[13]ZHANG D,YIN J,ZHU X,et al.Homophily,structure,andcontent augmented network representation learning[C]//2016 IEEE 16th International Conference on Data Mining(ICDM).IEEE,2016:609-618.
[14]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J].arXiv:1301.3781,2013.
[15]PEROZZI B,KULKARNI V,SKIENA S.Walklets:Multiscale graph embeddings for interpretable network classification[J].arXiv:1605.02115,2016.
[16]CAO S,LU W,XU Q.Deep neural networks for learning graph representations[C]//Thirtieth AAAI Conference on Artificial Intelligence.2016.
[17]YANG C,LIU Z,ZHAO D,et al.Network representation lear-ning with rich text information[C]//Twenty-Fourth Internatio-nal Joint Conference on Artificial Intelligence.2015.
[18]CHEN J,ZHANG Q,HUANG X.Incorporate group information to enhance network embedding[C]//Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.ACM,2016:1901-1904.
[19]YE Z L,ZHAO H X,ZHANG K,et al.Network Representation Learning Based on Multi-view Ensemble Algorithm[J].Computer Science,2019,46(1):117-125.
[20]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[J].arXiv:1609.02907,2016.
[21]MU J,BHAT S,VISWANATH P.All-but-the-top:Simple and effective postprocessing for word representations[J].arXiv:1702.01417,2017.
[22]WOLD S,ESBENSEN K,GELADI P.Principal component ana-lysis[J].Chemometrics and Intelligent Laboratory Systems,1987,2(1/2/3):37-52.
[23]FAN R E,CHANG K W,HSIEH C J,et al.LIBLINEAR:A library for large linear classification[J].Journal of Machine Learning Research,2008,9(8):1871-1874.
[24]HANLEY J A,MCNEIL B J.The meaning and use of the area under a receiver operating characteristic(ROC) curve[J].Radio-logy,1982,143(1):29-36.
[25]MENDITTO A,PATRIARCA M,MAGNUSSON B.Under-standing the meaning of accuracy,trueness and precision[J].Accreditation and Quality Assurance,2007,12(1):45-47.
[26]YAO L,WANG L,PAN L,et al.Link prediction based on common-neighbors for dynamic social network[J].Procedia Computer Science,2016,83:82-89.
[27]SHIBATA N,KAJIKAWA Y,SAKATA I.Link prediction in citation networks[J].Journal of the American Society for Information Science and Technology,2012,63(1):78-85.
[28]ACHLIOPTAS D.Database-friendly random projections:Johnson-Lindenstrauss with binary coins[J].Journal of Computer and System Sciences,2003,66(4):671-687.
[29]PAPADIMITRIOU C H,RAGHAVAN P,TAMAKI H,et al.Latent semantic indexing:A probabilistic analysis[J].Journal of Computer and System Sciences,2000,61(2):217-235.
[1] 宋杰, 梁美玉, 薛哲, 杜军平, 寇菲菲.
基于无监督集群级的科技论文异质图节点表示学习方法
Scientific Paper Heterogeneous Graph Node Representation Learning Method Based onUnsupervised Clustering Level
计算机科学, 2022, 49(9): 64-69. https://doi.org/10.11896/jsjkx.220500196
[2] 黄丽, 朱焱, 李春平.
基于异构网络表征学习的作者学术行为预测
Author’s Academic Behavior Prediction Based on Heterogeneous Network Representation Learning
计算机科学, 2022, 49(9): 76-82. https://doi.org/10.11896/jsjkx.210900078
[3] 李其烨, 邢红杰.
基于最大相关熵的KPCA异常检测方法
KPCA Based Novelty Detection Method Using Maximum Correntropy Criterion
计算机科学, 2022, 49(8): 267-272. https://doi.org/10.11896/jsjkx.210700175
[4] 阙华坤, 冯小峰, 刘盼龙, 郭文翀, 李健, 曾伟良, 范竞敏.
Grassberger熵随机森林在窃电行为检测的应用
Application of Grassberger Entropy Random Forest to Power-stealing Behavior Detection
计算机科学, 2022, 49(6A): 790-794. https://doi.org/10.11896/jsjkx.210800032
[5] 陈世聪, 袁得嵛, 黄淑华, 杨明.
基于结构深度网络嵌入模型的节点标签分类算法
Node Label Classification Algorithm Based on Structural Depth Network Embedding Model
计算机科学, 2022, 49(3): 105-112. https://doi.org/10.11896/jsjkx.201000177
[6] 郭磊, 马廷淮.
基于好友亲密度的用户匹配
Friend Closeness Based User Matching
计算机科学, 2022, 49(3): 113-120. https://doi.org/10.11896/jsjkx.210200137
[7] 杨旭华, 王磊, 叶蕾, 张端, 周艳波, 龙海霞.
基于节点相似性和网络嵌入的复杂网络社区发现算法
Complex Network Community Detection Algorithm Based on Node Similarity and Network Embedding
计算机科学, 2022, 49(3): 121-128. https://doi.org/10.11896/jsjkx.210200009
[8] 郑苏苏, 关东海, 袁伟伟.
融合不完整多视图的异质信息网络嵌入方法
Heterogeneous Information Network Embedding with Incomplete Multi-view Fusion
计算机科学, 2021, 48(9): 68-76. https://doi.org/10.11896/jsjkx.210500203
[9] 杨宏鑫, 宋宝燕, 刘婷婷, 杜岳峰, 李晓光.
基于耦合随机投影的张量填充方法
Tensor Completion Method Based on Coupled Random Projection
计算机科学, 2021, 48(8): 66-71. https://doi.org/10.11896/jsjkx.200900055
[10] 吴善杰, 王新.
基于AGA-DBSCAN优化的RBF神经网络构造煤厚度预测方法
Prediction of Tectonic Coal Thickness Based on AGA-DBSCAN Optimized RBF Neural Networks
计算机科学, 2021, 48(7): 308-315. https://doi.org/10.11896/jsjkx.200800110
[11] 陈恒, 王维美, 李冠宇, 史一民.
四元数关系旋转的知识图谱补全模型
Knowledge Graph Completion Model Using Quaternion as Relational Rotation
计算机科学, 2021, 48(5): 225-231. https://doi.org/10.11896/jsjkx.200300093
[12] 杨旭华, 王晨.
基于网络嵌入与局部合力的复杂网络社区划分算法
Community Detection Algorithm in Complex Network Based on Network Embedding and Local Resultant Force
计算机科学, 2021, 48(4): 229-236. https://doi.org/10.11896/jsjkx.200200102
[13] 王艺皓, 丁洪伟, 李波, 保利勇, 张颖婕.
基于聚类与特征融合的蛋白质亚细胞定位预测
Prediction of Protein Subcellular Localization Based on Clustering and Feature Fusion
计算机科学, 2021, 48(3): 206-213. https://doi.org/10.11896/jsjkx.200200081
[14] 张健雄, 宋坤, 何鹏, 李兵.
基于图神经网络的软件系统中关键类的识别
Identification of Key Classes in Software Systems Based on Graph Neural Networks
计算机科学, 2021, 48(12): 149-158. https://doi.org/10.11896/jsjkx.210100200
[15] 徐新黎, 肖云月, 龙海霞, 杨旭华, 毛剑飞.
基于矩阵分解的属性网络嵌入和社区发现算法
Attributed Network Embedding Based on Matrix Factorization and Community Detection
计算机科学, 2021, 48(12): 204-211. https://doi.org/10.11896/jsjkx.210300060
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!