计算机科学 ›› 2024, Vol. 51 ›› Issue (3): 109-117.doi: 10.11896/jsjkx.221200063
张国昊, 王轶, 周喜, 王保全
ZHANG Guohao, WANG Yi, ZHOU Xi, WANG Baoquan
摘要: 大数据时代,数据价值的释放经常需要融合多源数据,数据冲突成为这一过程中无法避免的关键问题。为了从冲突数据中筛选出真实声明以及可靠数据源,研究人员提出了真值发现方法。然而,现有的真值发现大多注重数据源与声明之间的直接协同信息,忽略了更深层的间接协同与对抗信息,导致不足以表达出数据源与声明的特征。针对此问题,提出了基于变分多跳图注意力编码器的真值发现方法(TD-VMGAE),基于数据源与声明之间的包含关系构建二分图网络,采用多跳图注意力层为每个节点表征汇聚间接协同信息以及对抗信息,并设计真值发现变分自编码器,抽取节点表征中所需的分类分布,对数据源和声明进行协同分类。实验结果表明,所提方法在3个不同尺度的数据集中均有不错的表现,消融实验和可视化也验证了所提方法的有效性和泛化能力。
中图分类号:
[1]MENG X F,CI X.Big Data Management:Concepts,Techniques and Challenges[J].Journal of Computer Research and Development,2013,50(1):146-169. [2]LI Y,GAO J,MENG C,et al.A survey on truth discovery[J].ACM Sigkdd Explorations Newsletter,2016,17(2):1-16. [3]YIN X,HAN J,YU P S.Truth Discovery with Multiple Conflicting Information Providers on the Web[J].IEEE Transactions on Knowledge and Data Engineering,2008,6(20):796-808. [4]CHANG C,CAO J J,ZHENG Q B,et al.Truth Discovery from Text Data by Bi-GRU with Attention Mechnism[J].Journal of Chinese Information Processing,2020,34(2):46-55. [5]SABETPOUR N,KULKARNI A,XIE S,et al.Truth discovery in sequence labels from crowds[C]//2021 IEEE International Conference on Data Mining(ICDM).IEEE,2021:539-548. [6]CHANG C,CAO J,ZHENG Q,et al.Anunsupervised approach of truth discovery from multi-sourced text data[J].IEEE Access,2019,7:143479-143489. [7]YE C,WANG H,LU W,et al.Deep truth discovery for pattern-based fact extraction[J].Information Sciences,2021,580:478-494. [8]AYDIN B,YILMAZ Y,LI Y,et al.Crowdsourcing for multiple-choice question an-swering[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2014:2946-2953. [9]VENANZI M,GUIVER J,KAZAI G,et al.Community-basedbayesian aggregation models for crowdsourcing[C]//Procee-dings of the 23rd International Conference on World Wide Web.2014:155-164. [10]DU Y,SUN Y E,HUANG H,et al.Bayesian co-clustering truth discovery for mobile crowd sensing systems[J].IEEE Transactions on Industrial Informatics,2019,16(2):1045-1057. [11]DEMARTINI G,DIFALLAH D E,CUDRÉ-MAUROUX P.Zen-crowd:leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]//Proceedings of the 21st International Conference on World Wide Web.2012:469-478. [12]ZHANG J,SHENG V S,WU J,et al.Multi-class ground truth inference in crowdsourcing with clustering[J].IEEE Transactions on Knowledge and Data Engineering,2015,28(4):1080-1085. [13]ZHOU L,ZHUO X,WU G,et al.Research on Crowdsour-cing Truth Inference Method Based on Graph Embedding[C]//2021 IEEE International Conference on Big Knowledge(ICBK).IEEE,2021:206-213. [14]DONG X,GABRILOVICH E,HEITZ G,et al.Knowledgevault:A web-scale approach to proba-bilistic knowledge fusion[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2014:601-610. [15]DONG X L,GABRILOVICH E,HEITZ G,et al.From data fusion to knowledge fusion[J].Proceedings of the VLDB Endowment,2014,7(10):881-892. [16]PASTERNACK J,ROTH D.Knowing what to b-elieve(when you already know something)[C]// Proceedings of the 23rd International Conference on Computational Linguistics(Coling 2010).2010:877-885. [17]LI X,DONG X L,LYONS K,et al.Truth Finding on the Deep Web:Is the Problem Solved?[J].Proceedings of the VLDB Endowment,2012,6(2):97-108. [18]DONG X L,BERTI-EQUILLE L,SRIVASTAVA D.Integrating conflicting data:the role of source dependence[J].Procee-dings of the VLDB Endowment,2009,2(1):550-561. [19]GALLAND A,ABITEBOUL S,MARIAN A,et al.Corroborating information from disagreeing views[C]//Proceedings of the third ACM International Conference on Web Search and Data Mining.2010:131-140. [20]LI Q,LI Y,GAO J,et al.A confidence-aware approach fortruth discovery on long-tail data[J].Proceedings of the VLDB Endowment,2014,8(4):425-436. [21]LI Q,LI Y,GAO J,et al.Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation[C]//Proceedings of the 2014 ACM SIGMOD International Confe-rence on Management of Data.2014:1187-1198. [22]LI Y,LI Q,GAO J,et al.On the discovery of evolving truth[C]//Proceedings of the 21th ACM Sigkdd International Conference on knowledge Discovery and Data Mining.2015:675-684. [23]ZHAO B,RUBINSTEIN B I P,GEMMELL J,et al.A Bayesian approach to discovering truth from conflicting sources for data integration[J].Proceedings of the VLDB Endowment,2012,5(6):550-561. [24]KIM H C,GHAHRAMANI Z.Bayesian classi-fier combination[C]//Artificial Intelligence and Statistics.PMLR,2012:619-627. [25]WHITEHILL J,RUVOLO P,WU T,et al.Who-se vote should count more:optimal integration of labels from labelers of unknown expertise[C]//Proceedings of the 22nd International Conference on Neural Information Processing Systems.2009:2035-2043. [26]LYU S,OUYANG W,SHEN H,et al.Truth discovery by claim and source embedding[C]//Proceedings of the 2017 ACM on Conference on Information and Knowledge Management.2017:2183-2186. [27]YANG J,TAY W P.An unsupervised Bayesian neural network for truth discovery in social- networks[J].IEEE Transactions on Knowledge and Data Engineering,2021,34(11):5182-5195. [28]LIU J,TANG F,HUANG J.Truth Inference with Bipartite Attention Graph Neural Network from a Comprehensive View[C]//2021 IEEE International Conference on Multimedia and Expo(ICME).IEEE,2021:1-6. [29]CAO J J,CHANG C,ZHENG Q B,et al.Truth discovery me-thod for multi-source text data[J].Journal of National University of Defense Technology,2022,44(4):172-179. [30]CHANG C,CAO J J,ZHENG Q B,et al.Unsupervised Multi-Attributes Truth Discovery with Deep Neural Network[J].Computer Integrated Manufacturing Systems,2020,37(11):270-274. [31]LU H,FANG X S,SI S X,et al.a graph embedding model for correlation aware truth discovery[J].Intelligent Computer and Applications,2022,12(10):9-14. [32]RENDLE S,FREUDENTHALER C,GANTNER Z,et al.BPR:Bayesian personalized ranking from implicit feedback[C]//Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence.2009:452-461. [33]DIENG A B,RUIZ F J R,BLEI D M.Topic modeling in embedding spaces[J].Transactions of the Association for Computational Linguistics,2020,8:439-453. [34]VELICKOVIC P,CUCURULL G,CASANOVA A,et al.Graph attention networks[J].arXiv:1701,10903,2017. [35]WANG G,YING R,HUANG J,et al.Multi-hop attention graph neural network[J].arXiv:2009.14332,2020. [36]KINGMA D P,WELLING M.Auto-Encoding Variational Bayes[J].arXiv:1312.6114,2014. [37]ZHENG Y,LI G,LI Y,et al.Truth inference in crowdsourcing:Is the problem solved?[J].Proceedings of the VLDB Endowment,2017,10(5):541-552. |
|