Computer Science ›› 2024, Vol. 51 ›› Issue (3): 109-117.doi: 10.11896/jsjkx.221200063

• Database & Big Data & Data Science • Previous Articles     Next Articles

Deep Collaborative Truth Discovery Based on Variational Multi-hop Graph Attention Encoder

ZHANG Guohao, WANG Yi, ZHOU Xi, WANG Baoquan   

  1. Xinjiang Technical Institute of Physics & Chemistry,Chinese Academy of Sciences,Urumqi 830011,China
    University of Chinese Academy of Sciences,Beijing 100049,China
    Xinjiang Laboratory of Minority Speech and Language Information Processing,Urumqi 830011,China
  • Received:2022-12-12 Revised:2023-04-04 Online:2024-03-15 Published:2024-03-13
  • About author:ZHANG Guohao,born in 1997,postgraduate.His main research interests include big data governance and so on.WANG Yi,born in 1986,Ph.D professor,is a senior member of CCF(No.98372S).His main research interests include big data governance and block chain applications.
  • Supported by:
    Xinjiang Key Laboratory for Minority Speech and Language Information Processing(2020D04050),Natural Science Foundation for Distinguished Young Scholars of Xinjiang Uygur Autonomous Region,China(2022D01E04),Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01B67) and Youth Innovation Promotion Association of Chinese Academy of Sciences(2021434).

Abstract: In the era of big data,the release of data value often requires the fusion of multi-source data,and data conflict has become an inevitable key problem in this process.In order to filter out true claims and reliable sources from conflicting data,researchers have proposed truth discovery methods.However,the existing truth discovery methods pay more attention to the direct collaborative information between sources and claims,and ignore the deeper indirect collaborative and confrontational information,which is insufficient to express the characteristics of sources and claims.To solve this problem,this paper proposes a truth discovery method based on variational multi-hop graph attention encoder(TD-VMGAE).It constructs a bipartite graph network based on the inclusion relationship between sources and claims,uses a multi-hop graph attention layer to gather indirect cooperative information and antagonistic information for of each node,and a truth discovery variational auto-encoder is designed to extract the categorical distribution required in node characterization,and collaborative classification of data sources and claims is carried out.Experiments show that the proposed method has good performance in three datasets with different scales,and the effectiveness and generalization ability of the method are verified by ablation experiments and visualization.

Key words: Data quality, Conflict resolution, Truth discovery, Multi-hop attention graph neural network, Variational auto-encoder

CLC Number: 

  • TP391
[1]MENG X F,CI X.Big Data Management:Concepts,Techniques and Challenges[J].Journal of Computer Research and Development,2013,50(1):146-169.
[2]LI Y,GAO J,MENG C,et al.A survey on truth discovery[J].ACM Sigkdd Explorations Newsletter,2016,17(2):1-16.
[3]YIN X,HAN J,YU P S.Truth Discovery with Multiple Conflicting Information Providers on the Web[J].IEEE Transactions on Knowledge and Data Engineering,2008,6(20):796-808.
[4]CHANG C,CAO J J,ZHENG Q B,et al.Truth Discovery from Text Data by Bi-GRU with Attention Mechnism[J].Journal of Chinese Information Processing,2020,34(2):46-55.
[5]SABETPOUR N,KULKARNI A,XIE S,et al.Truth discovery in sequence labels from crowds[C]//2021 IEEE International Conference on Data Mining(ICDM).IEEE,2021:539-548.
[6]CHANG C,CAO J,ZHENG Q,et al.Anunsupervised approach of truth discovery from multi-sourced text data[J].IEEE Access,2019,7:143479-143489.
[7]YE C,WANG H,LU W,et al.Deep truth discovery for pattern-based fact extraction[J].Information Sciences,2021,580:478-494.
[8]AYDIN B,YILMAZ Y,LI Y,et al.Crowdsourcing for multiple-choice question an-swering[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2014:2946-2953.
[9]VENANZI M,GUIVER J,KAZAI G,et al.Community-basedbayesian aggregation models for crowdsourcing[C]//Procee-dings of the 23rd International Conference on World Wide Web.2014:155-164.
[10]DU Y,SUN Y E,HUANG H,et al.Bayesian co-clustering truth discovery for mobile crowd sensing systems[J].IEEE Transactions on Industrial Informatics,2019,16(2):1045-1057.
[11]DEMARTINI G,DIFALLAH D E,CUDRÉ-MAUROUX P.Zen-crowd:leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]//Proceedings of the 21st International Conference on World Wide Web.2012:469-478.
[12]ZHANG J,SHENG V S,WU J,et al.Multi-class ground truth inference in crowdsourcing with clustering[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(4):1080-1085.
[13]ZHOU L,ZHUO X,WU G,et al.Research on Crowdsour-cing Truth Inference Method Based on Graph Embedding[C]//2021 IEEE International Conference on Big Knowledge(ICBK).IEEE,2021:206-213.
[14]DONG X,GABRILOVICH E,HEITZ G,et al.Knowledgevault:A web-scale approach to proba-bilistic knowledge fusion[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2014:601-610.
[15]DONG X L,GABRILOVICH E,HEITZ G,et al.From data fusion to knowledge fusion[J].Proceedings of the VLDB Endowment,2014,7(10):881-892.
[16]PASTERNACK J,ROTH D.Knowing what to b-elieve(when you already know something)[C]// Proceedings of the 23rd International Conference on Computational Linguistics(Coling 2010).2010:877-885.
[17]LI X,DONG X L,LYONS K,et al.Truth Finding on the Deep Web:Is the Problem Solved?[J].Proceedings of the VLDB Endowment,2012,6(2):97-108.
[18]DONG X L,BERTI-EQUILLE L,SRIVASTAVA D.Integrating conflicting data:the role of source dependence[J].Procee-dings of the VLDB Endowment,2009,2(1):550-561.
[19]GALLAND A,ABITEBOUL S,MARIAN A,et al.Corroborating information from disagreeing views[C]//Proceedings of the third ACM International Conference on Web Search and Data Mining.2010:131-140.
[20]LI Q,LI Y,GAO J,et al.A confidence-aware approach fortruth discovery on long-tail data[J].Proceedings of the VLDB Endowment,2014,8(4):425-436.
[21]LI Q,LI Y,GAO J,et al.Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation[C]//Proceedings of the 2014 ACM SIGMOD International Confe-rence on Management of Data.2014:1187-1198.
[22]LI Y,LI Q,GAO J,et al.On the discovery of evolving truth[C]//Proceedings of the 21th ACM Sigkdd International Conference on knowledge Discovery and Data Mining.2015:675-684.
[23]ZHAO B,RUBINSTEIN B I P,GEMMELL J,et al.A Bayesian approach to discovering truth from conflicting sources for data integration[J].Proceedings of the VLDB Endowment,2012,5(6):550-561.
[24]KIM H C,GHAHRAMANI Z.Bayesian classi-fier combination[C]//Artificial Intelligence and Statistics.PMLR,2012:619-627.
[25]WHITEHILL J,RUVOLO P,WU T,et al.Who-se vote should count more:optimal integration of labels from labelers of unknown expertise[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems.2009:2035-2043.
[26]LYU S,OUYANG W,SHEN H,et al.Truth discovery by claim and source embedding[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2183-2186.
[27]YANG J,TAY W P.An unsupervised Bayesian neural network for truth discovery in social- networks[J].IEEE Transactions on Knowledge and Data Engineering,2021,34(11):5182-5195.
[28]LIU J,TANG F,HUANG J.Truth Inference with Bipartite Attention Graph Neural Network from a Comprehensive View[C]//2021 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2021:1-6.
[29]CAO J J,CHANG C,ZHENG Q B,et al.Truth discovery me-thod for multi-source text data[J].Journal of National University of Defense Technology,2022,44(4):172-179.
[30]CHANG C,CAO J J,ZHENG Q B,et al.Unsupervised Multi-Attributes Truth Discovery with Deep Neural Network[J].Computer Integrated Manufacturing Systems,2020,37(11):270-274.
[31]LU H,FANG X S,SI S X,et al.a graph embedding model for correlation aware truth discovery[J].Intelligent Computer and Applications,2022,12(10):9-14.
[32]RENDLE S,FREUDENTHALER C,GANTNER Z,et al.BPR:Bayesian personalized ranking from implicit feedback[C]//Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence.2009:452-461.
[33]DIENG A B,RUIZ F J R,BLEI D M.Topic modeling in embedding spaces[J].Transactions of the Association for Computational Linguistics,2020,8:439-453.
[34]VELICKOVIC P,CUCURULL G,CASANOVA A,et al.Graph attention networks[J].arXiv:1701,10903,2017.
[35]WANG G,YING R,HUANG J,et al.Multi-hop attention graph neural network[J].arXiv:2009.14332,2020.
[36]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2014.
[37]ZHENG Y,LI G,LI Y,et al.Truth inference in crowdsourcing:Is the problem solved?[J].Proceedings of the VLDB Endowment,2017,10(5):541-552.
[1] SONG Jinyu, CHEN Lianyong, CHEN Gang. Data Quality Measurement Framework Research and Field Measurement Framework Construction [J]. Computer Science, 2024, 51(4): 19-27.
[2] PAN Lei, LIU Xin, CHEN Junyi, CHENG Zhangtao, LIU Leyuan, ZHOU Fan. Event Prediction Based on Dynamic Graph with Local Data Augmentation [J]. Computer Science, 2024, 51(3): 118-127.
[3] ZHAO Yanbin, SU Jindian. Bidirectional Inference Model with Multiple Latent Variables Based on Variational Auto-encoders [J]. Computer Science, 2023, 50(10): 176-183.
[4] HU Yan-yu, ZHAO Long, DONG Xiang-jun. Two-stage Deep Feature Selection Extraction Algorithm for Cancer Classification [J]. Computer Science, 2022, 49(7): 73-78.
[5] ZHANG Ren-jie, CHEN Wei, HANG Meng-xin, WU Li-fa. Detection of Abnormal Flow of Imbalanced Samples Based on Variational Autoencoder [J]. Computer Science, 2021, 48(7): 62-69.
[6] ZHENG Xiao-meng, GAO Meng, TENG Jun-yuan. Research on Construction Method of Defect Prediction Dataset for Spacecraft Software [J]. Computer Science, 2021, 48(6A): 575-580.
[7] LI Zhuo, XU Zhe, CHEN Xin, LI Shu-qin. Location-related Online Multi-task Assignment Algorithm for Mobile Crowd Sensing [J]. Computer Science, 2019, 46(6): 102-106.
[8] WANG Yang, CAI Shu-qin, ZOU Xin-wen, CHEN Zi-tong. Quality-embedded Hypergraph Model for Big Data Product Manufacturing System and Decision for Production Lines [J]. Computer Science, 2019, 46(2): 11-17.
[9] CAI Li, LIANG Yu, ZHU Yang-yong and HE Jing. History and Development Tendency of Data Quality [J]. Computer Science, 2018, 45(4): 1-10.
[10] SHANG Yu-ling, CAO Jian-jun, LI Hong-mei, ZHENG Qi-bin. Co-author and Affiliate Based Name Disambiguation Approach [J]. Computer Science, 2018, 45(11): 220-225.
[11] HUANG Dong-mei, ZHAO Dan-feng, WEI Li-fei, DU Yan-ling and WANG Zhen-hua. Managing Marine Data as Big Data:Uprising Challenges and Tentative Solutions [J]. Computer Science, 2016, 43(6): 17-23.
[12] CHEN Yi-rui and ZHUANG Yi. Concurrency Control Algorithm Based on Dynamic Decision [J]. Computer Science, 2015, 42(Z6): 1-4.
[13] HAN Jing-yu and CHEN Ke-jia. Ranking Data Quality of Web Article Content by Extracting Facts [J]. Computer Science, 2014, 41(11): 247-251.
[14] LIAO Wei-zhi,LI Wen-jing and LU Jian-bo. Dynamic Behavior Evolution for First-order Hybrid Petri Nets Based on Conflict Checking [J]. Computer Science, 2013, 40(11): 299-303.
[15] . Data Cleaning and its General System Framework [J]. Computer Science, 2012, 39(Z11): 207-211.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!