计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 109-117.doi: 10.11896/jsjkx.221200063

• 数据库&大数据&数据科学 • 上一篇    下一篇

一种基于变分多跳图注意力编码器的深层协同真值发现

张国昊, 王轶, 周喜, 王保全   

  1. 中国科学院新疆理化技术研究所 乌鲁木齐830011
    中国科学院大学 北京100049
    新疆民族语音语言信息处理实验室 乌鲁木齐830011
  • 收稿日期:2022-12-12 修回日期:2023-04-04 出版日期:2024-03-15 发布日期:2024-03-13
  • 通讯作者: 王轶(wangyi@ms.xjb.ac.cn)
  • 作者简介:(zhangguohao20@mails.ucas.ac.cn)
  • 基金资助:
    新疆维吾尔自治区重点实验室开放课题(2020D04050);新疆自然科学基金杰出青年基金(2022D01E04);新疆维吾尔自治区自然科学基金(2022D01B67);中科院青年创新促进会项目(2021434)

Deep Collaborative Truth Discovery Based on Variational Multi-hop Graph Attention Encoder

ZHANG Guohao, WANG Yi, ZHOU Xi, WANG Baoquan   

  1. Xinjiang Technical Institute of Physics & Chemistry,Chinese Academy of Sciences,Urumqi 830011,China
    University of Chinese Academy of Sciences,Beijing 100049,China
    Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011,China
  • Received:2022-12-12 Revised:2023-04-04 Online:2024-03-15 Published:2024-03-13
  • About author:ZHANG Guohao,born in 1997,postgraduate.His main research interests include big data governance and so on.WANG Yi,born in 1986,Ph.D professor,is a senior member of CCF(No.98372S).His main research interests include big data governance and block chain applications.
  • Supported by:
    Xinjiang Key Laboratory for Minority Speech and Language Information Processing(2020D04050),Natural Science Foundation for Distinguished Young Scholars of Xinjiang Uygur Autonomous Region,China(2022D01E04),Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01B67) and Youth Innovation Promotion Association of Chinese Academy of Sciences(2021434).

摘要: 大数据时代,数据价值的释放经常需要融合多源数据,数据冲突成为这一过程中无法避免的关键问题。为了从冲突数据中筛选出真实声明以及可靠数据源,研究人员提出了真值发现方法。然而,现有的真值发现大多注重数据源与声明之间的直接协同信息,忽略了更深层的间接协同与对抗信息,导致不足以表达出数据源与声明的特征。针对此问题,提出了基于变分多跳图注意力编码器的真值发现方法(TD-VMGAE),基于数据源与声明之间的包含关系构建二分图网络,采用多跳图注意力层为每个节点表征汇聚间接协同信息以及对抗信息,并设计真值发现变分自编码器,抽取节点表征中所需的分类分布,对数据源和声明进行协同分类。实验结果表明,所提方法在3个不同尺度的数据集中均有不错的表现,消融实验和可视化也验证了所提方法的有效性和泛化能力。

关键词: 数据质量, 冲突消解, 真值发现, 多跳图注意力, 变分自编码器

Abstract: In the era of big data,the release of data value often requires the fusion of multi-source data,and data conflict has become an inevitable key problem in this process.In order to filter out true claims and reliable sources from conflicting data,researchers have proposed truth discovery methods.However,the existing truth discovery methods pay more attention to the direct collaborative information between sources and claims,and ignore the deeper indirect collaborative and confrontational information,which is insufficient to express the characteristics of sources and claims.To solve this problem,this paper proposes a truth discovery method based on variational multi-hop graph attention encoder(TD-VMGAE).It constructs a bipartite graph network based on the inclusion relationship between sources and claims,uses a multi-hop graph attention layer to gather indirect cooperative information and antagonistic information for of each node,and a truth discovery variational auto-encoder is designed to extract the categorical distribution required in node characterization,and collaborative classification of data sources and claims is carried out.Experiments show that the proposed method has good performance in three datasets with different scales,and the effectiveness and generalization ability of the method are verified by ablation experiments and visualization.

Key words: Data quality, Conflict resolution, Truth discovery, Multi-hop attention graph neural network, Variational auto-encoder

中图分类号: 

  • TP391
[1]MENG X F,CI X.Big Data Management:Concepts,Techniques and Challenges[J].Journal of Computer Research and Development,2013,50(1):146-169.
[2]LI Y,GAO J,MENG C,et al.A survey on truth discovery[J].ACM Sigkdd Explorations Newsletter,2016,17(2):1-16.
[3]YIN X,HAN J,YU P S.Truth Discovery with Multiple Conflicting Information Providers on the Web[J].IEEE Transactions on Knowledge and Data Engineering,2008,6(20):796-808.
[4]CHANG C,CAO J J,ZHENG Q B,et al.Truth Discovery from Text Data by Bi-GRU with Attention Mechnism[J].Journal of Chinese Information Processing,2020,34(2):46-55.
[5]SABETPOUR N,KULKARNI A,XIE S,et al.Truth discovery in sequence labels from crowds[C]//2021 IEEE International Conference on Data Mining(ICDM).IEEE,2021:539-548.
[6]CHANG C,CAO J,ZHENG Q,et al.Anunsupervised approach of truth discovery from multi-sourced text data[J].IEEE Access,2019,7:143479-143489.
[7]YE C,WANG H,LU W,et al.Deep truth discovery for pattern-based fact extraction[J].Information Sciences,2021,580:478-494.
[8]AYDIN B,YILMAZ Y,LI Y,et al.Crowdsourcing for multiple-choice question an-swering[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2014:2946-2953.
[9]VENANZI M,GUIVER J,KAZAI G,et al.Community-basedbayesian aggregation models for crowdsourcing[C]//Procee-dings of the 23rd International Conference on World Wide Web.2014:155-164.
[10]DU Y,SUN Y E,HUANG H,et al.Bayesian co-clustering truth discovery for mobile crowd sensing systems[J].IEEE Transactions on Industrial Informatics,2019,16(2):1045-1057.
[11]DEMARTINI G,DIFALLAH D E,CUDRÉ-MAUROUX P.Zen-crowd:leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]//Proceedings of the 21st International Conference on World Wide Web.2012:469-478.
[12]ZHANG J,SHENG V S,WU J,et al.Multi-class ground truth inference in crowdsourcing with clustering[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(4):1080-1085.
[13]ZHOU L,ZHUO X,WU G,et al.Research on Crowdsour-cing Truth Inference Method Based on Graph Embedding[C]//2021 IEEE International Conference on Big Knowledge(ICBK).IEEE,2021:206-213.
[14]DONG X,GABRILOVICH E,HEITZ G,et al.Knowledgevault:A web-scale approach to proba-bilistic knowledge fusion[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2014:601-610.
[15]DONG X L,GABRILOVICH E,HEITZ G,et al.From data fusion to knowledge fusion[J].Proceedings of the VLDB Endowment,2014,7(10):881-892.
[16]PASTERNACK J,ROTH D.Knowing what to b-elieve(when you already know something)[C]// Proceedings of the 23rd International Conference on Computational Linguistics(Coling 2010).2010:877-885.
[17]LI X,DONG X L,LYONS K,et al.Truth Finding on the Deep Web:Is the Problem Solved?[J].Proceedings of the VLDB Endowment,2012,6(2):97-108.
[18]DONG X L,BERTI-EQUILLE L,SRIVASTAVA D.Integrating conflicting data:the role of source dependence[J].Procee-dings of the VLDB Endowment,2009,2(1):550-561.
[19]GALLAND A,ABITEBOUL S,MARIAN A,et al.Corroborating information from disagreeing views[C]//Proceedings of the third ACM International Conference on Web Search and Data Mining.2010:131-140.
[20]LI Q,LI Y,GAO J,et al.A confidence-aware approach fortruth discovery on long-tail data[J].Proceedings of the VLDB Endowment,2014,8(4):425-436.
[21]LI Q,LI Y,GAO J,et al.Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation[C]//Proceedings of the 2014 ACM SIGMOD International Confe-rence on Management of Data.2014:1187-1198.
[22]LI Y,LI Q,GAO J,et al.On the discovery of evolving truth[C]//Proceedings of the 21th ACM Sigkdd International Conference on knowledge Discovery and Data Mining.2015:675-684.
[23]ZHAO B,RUBINSTEIN B I P,GEMMELL J,et al.A Bayesian approach to discovering truth from conflicting sources for data integration[J].Proceedings of the VLDB Endowment,2012,5(6):550-561.
[24]KIM H C,GHAHRAMANI Z.Bayesian classi-fier combination[C]//Artificial Intelligence and Statistics.PMLR,2012:619-627.
[25]WHITEHILL J,RUVOLO P,WU T,et al.Who-se vote should count more:optimal integration of labels from labelers of unknown expertise[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems.2009:2035-2043.
[26]LYU S,OUYANG W,SHEN H,et al.Truth discovery by claim and source embedding[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2183-2186.
[27]YANG J,TAY W P.An unsupervised Bayesian neural network for truth discovery in social- networks[J].IEEE Transactions on Knowledge and Data Engineering,2021,34(11):5182-5195.
[28]LIU J,TANG F,HUANG J.Truth Inference with Bipartite Attention Graph Neural Network from a Comprehensive View[C]//2021 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2021:1-6.
[29]CAO J J,CHANG C,ZHENG Q B,et al.Truth discovery me-thod for multi-source text data[J].Journal of National University of Defense Technology,2022,44(4):172-179.
[30]CHANG C,CAO J J,ZHENG Q B,et al.Unsupervised Multi-Attributes Truth Discovery with Deep Neural Network[J].Computer Integrated Manufacturing Systems,2020,37(11):270-274.
[31]LU H,FANG X S,SI S X,et al.a graph embedding model for correlation aware truth discovery[J].Intelligent Computer and Applications,2022,12(10):9-14.
[32]RENDLE S,FREUDENTHALER C,GANTNER Z,et al.BPR:Bayesian personalized ranking from implicit feedback[C]//Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence.2009:452-461.
[33]DIENG A B,RUIZ F J R,BLEI D M.Topic modeling in embedding spaces[J].Transactions of the Association for Computational Linguistics,2020,8:439-453.
[34]VELICKOVIC P,CUCURULL G,CASANOVA A,et al.Graph attention networks[J].arXiv:1701,10903,2017.
[35]WANG G,YING R,HUANG J,et al.Multi-hop attention graph neural network[J].arXiv:2009.14332,2020.
[36]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2014.
[37]ZHENG Y,LI G,LI Y,et al.Truth inference in crowdsourcing:Is the problem solved?[J].Proceedings of the VLDB Endowment,2017,10(5):541-552.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!