计算机科学 ›› 2021, Vol. 48 ›› Issue (11): 170-175.doi: 10.11896/jsjkx.201100004

所属专题: 大数据&数据科学 虚拟专题

• 数据库&大数据&数据科学 • 上一篇    下一篇

基于图的多源数据融合框架研究

匡广生1,2, 郭岩2, 俞晓明2, 刘悦2, 程学旗2   

  1. 1 中国科学院大学 北京100049
    2 中国科学院计算技术研究所 中国科学院网络数据科学与技术重点实验室 北京100190
  • 收稿日期:2020-11-02 修回日期:2021-03-18 出版日期:2021-11-15 发布日期:2021-11-10
  • 通讯作者: 郭岩(guoy@ict.ac.cn)
  • 作者简介:kuangguangsheng@ict.ac.cn
  • 基金资助:
    国家重点研发计划(2017YFB0803302)

Study on Multi-source Data Fusion Framework Based on Graph

KUANG Guang-sheng1,2, GUO Yan2, YU Xiao-ming2, LIU Yue2, CHENG Xue-qi2   

  1. 1 University of Chinese Academy of Sciences,Beijing 100049,China
    2 Key Laboratory of Network Data Science & Technology,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190,China
  • Received:2020-11-02 Revised:2021-03-18 Online:2021-11-15 Published:2021-11-10
  • About author:KUANG Guang-sheng,born in 1995,postgraduate.His main research in-terests include natural language proces-sing and data fusion.
    GUO Yan,born in 1974,Ph.D,associate researcher.Her main research interests include network information acquisition and so on.
  • Supported by:
    National Key Research and Development Program of China(2017YFB0803302).

摘要: 在给定的任务中分析各种数据时,目前大多数研究只针对单源数据进行分析,缺乏应用于多源数据的方法。但如今数据日益丰富,因此提出一种多源数据融合框架,用于融合多种网络平台数据。同一平台数据中包含文本与各种属性,同时不同平台的数据在内容与形式方面也存在很大差异。然而现有的网络信息挖掘方法大多仅使用同一平台中的部分数据进行分析,忽略了不同平台的数据之间存在的相互作用。因此文中提出一种数据融合框架,一方面,能基于图的强大表示能力融合同一平台不同类型的特征,从而提升单个平台的任务性能;另一方面能够利用不同平台的数据特征,使其相互补充,从而提升多个平台的任务性能。文中讨论的融合数据类型包括文本、时间、作者信息,这些特征涉及连续特征、离散特征以及非结构化特征。所提框架在事件分类任务上提升了F1值,验证了提出的多源数据框架的有效性。

关键词: 多源数据, 融合表示, 图融合

Abstract: When analyzing various data in a given task,most of current researches only analyze single-source data and lack me-thods applied to multi-source data.But now data are becoming more abundant,therefore,this paper proposes a multi-source data fusion framework for fusing data from multiple network platforms.The data of the same platform contains text and various attri-butes,and there are also great differences in content and form among data of different platforms.Most existing network information mining methods only use part of the data in the same platform for analysis,and even ignore the interaction between the data of different platforms.Therefore,this paper proposes a data fusion framework,which can not only use more features of the same platform to improve the performance of a single platform,but also fuse the data features of different platforms to complement each other,thereby improving the performance of multiple platforms.This paper uses the task of event classification,and the abundant features effectively improve the F1 value,which verifies the effectiveness of the proposed multi-source data framework.

Key words: Fusion representation, Graph fusion, Multi-source

中图分类号: 

  • TP391
[1]LE Q,MIKOLOV T.Distributed representations of sentences and documents[C]//Proceedings of International Conference on Machine Learning.China:PMLR,2014:1188-1196.
[2]PEROZZI B,AL-RFOU R,SKIENA S.Deepwalk:Online lear-ning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Disco-very and Data Mining.New York:Association for Computing Machinery,2014:701-710.
[3]ZHANG J,WANG Y,LI K H,et al.Multi-source Sensor Body Area Network Data Fusion Model Based on Manifold Learning[J].Computer Science,2020,47(8):323-328.
[4]CAO S S,LU W,XU Q K.GraRep:Learning Graph Representations with Global Structural Information[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15).New York:Association for Computing Machinery,2015:891-900.
[5]TANG J,QU M,WANG M,et al.LINE:Large-scale information network embedding[C]//International Conference on World Wide Web.2015:1067-1077.
[6]GROVER A,LESKOVEC J.Node2vec:Scalable Feature Lear-ning for Networks[C]//ACM Sigkdd International Conference on Knowledge Discovery & Data Mining.ACM,2016.
[7]RIBEIRO L F R,SAVERESE P H P,FIGUEIREDO D R.Struc2vec:Learning node representations from structural identity[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2017:385-394.
[8]DONG M Y,CHAWLA N V,SWAMI A.metapath2vec:Scalable Representation Learning for Heterogeneous Networks[C]//the 23rd ACM SIGKDD International Conference.ACM,2017.
[9]DEFFERRARD M,BRESSON X,VANDERGHEYNST P.Convolutional neural networks on graphs with fast localized spectral filtering[J].Advances in Neural Information Processing Systems,2016,29:3844-3852.
[10]KIPF T N,WELLING M.Semi-supervised classification withgraph convolutional networks[C]//5th International Confe-rence on Learning Representations.2017.
[11]SCARSELLI F,GORI M,TSOI A C,et al.The graph neural network model[J].IEEE Transactions on Neural Networks,2008,20(1):61-80.
[12]VELIČKOVIC′ P,CUCURULL G,CASANOVA A,et al.Graph Attention Networks[C]//International Conference on Learning Representations.2018.
[13]LIU Z M.Research on network representation Learning Method based on heterogeneous information fusion [D].Information Engineering University,Strategic Support Forces,2018.
[14]YANG C,LIU Z Y,ZHAO D L,et al.Network representation learning with rich text information[C]//Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15).Buenos Aires,Argentina:AAAI Press,2015 S:2111-2117.
[15]LIAO L,HE X,ZHANG H,et al.Attributed social network embedding[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(12):2257-2270.
[16]HSIEH C J,CHIANG K Y,DHILLON I S.Low rank modeling of signed networks[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Beijing,China,2012:507-515.
[17]SUN B,SAENKO K.Deep CORAL:Correlation Alignment forDeep Domain Adaptation[C]//Lecture Notes in Computer Science(ECCV 2016).Workshops,ECCV.Cham:Springer,2016:443-450.
[18]YU C,WANG J,CHEN Y,et al.Transfer learning with dyna-mic adversarial adaptation network[C]//2019 IEEE International Conference on Data Mining (ICDM).IEEE,2019:778-786.
[19]LIU J,LI T,XIE P,et al.Urban big data fusion based on deep learning:An overview[J].Information Fusion,2020,53:123-133.
[20]ZHOU J,HONG X,JIN P.Information Fusion for Multi-Source Material Data:Progress and Challenges[J].Applied Sciences,2019,9(17):3473.
[21]YANG Z,LI Q,LU Z,et al.Dual structure constrained multimodal feature coding for social event detection from flickr data
[J].ACM Transactions on Internet Technology (TOIT),2017,17(2):1-20.
[22]LIN Z.YANG Z,SITU R,et al.Improving Maximum Classifier Discrepancy by Considering Joint Distribution for Domain Adaptation[C]//WISE 2018.Cham:Springer,2018:253-268.
[1] 郑苏苏, 关东海, 袁伟伟.
融合不完整多视图的异质信息网络嵌入方法
Heterogeneous Information Network Embedding with Incomplete Multi-view Fusion
计算机科学, 2021, 48(9): 68-76. https://doi.org/10.11896/jsjkx.210500203
[2] 杨少鹏, 刘宏哲, 王雪峤.
基于特征图融合的小尺寸人脸检测
Small Size Face Detection Based on Feature Map Fusion
计算机科学, 2020, 47(6): 126-132. https://doi.org/10.11896/jsjkx.19050002
[3] 吴加莹,杨赛,堵俊,林宏达.
自底向上的显著性目标检测研究综述
Review of Bottom-up Salient Object Detection
计算机科学, 2019, 46(3): 48-52. https://doi.org/10.11896/j.issn.1002-137X.2019.03.006
[4] 琚春华, 邹江波, 傅小康.
融入区块链技术的大数据征信平台的设计与应用研究
Design and Application of Big Data Credit Reporting Platform Integrating Blockchain Technology
计算机科学, 2018, 45(11A): 522-526.
[5] 危辉.
人工智能的神经系统动力学融合表示模型研究

计算机科学, 2003, 30(7): 144-146.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!