Computer Science ›› 2024, Vol. 51 ›› Issue (2): 1-14.doi: 10.11896/jsjkx.221200075

• Discipline Frontier • Previous Articles     Next Articles

Multi-source Heterogeneous Data Fusion Technologies and Government Big Data GovernanceSystem

YAN Jiahe1, LI Honghui1, MA Ying2, LIU Zhen1, ZHANG Dalin3, JIANG Zhouxian1, DUAN Yuhang1   

  1. 1 School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China
    2 National Information Center,Beijing 100045,China
    3 School of Software,Beijing Jiaotong University,Beijing 100044,China
  • Received:2022-12-11 Revised:2023-03-13 Online:2024-02-15 Published:2024-02-22
  • About author:YAN Jiahe,born in 1994,Ph.D candidate.Her main research interests include data mining and data fusion.LI Honghui,born in 1964,professor,Ph.D supervisor.Her main research interests include software testing,big data governance and rail transit information technology.
  • Supported by:
    National Key Research and Development Program of China(2019YFB2102500).

Abstract: With the rapid development of information technology,the data held by governments and enterprises are growing exponentially.However,the multi-source of data will lead to different formats,the low quality of data will affect the application results,the decentralized management of data will weaken integration services,and the heterogeneous modal of data will cause semantic gaps.Under this background,multi-source heterogeneous data fusion is responsible for effectively integrating multi-modal data from different sources,and then achieve information complementarity and data association,thus realizing information enhancement.At present,most studies focus on big data governance process and multi-modal deep learning,there are few works discuss integral multi-source heterogeneous data fusion framework.Therefore,based on reviewing the key technologies,this paper proposes the key technologies framework of multi-source heterogeneous data fusion that covering the processes of “data collection-data cleaning-data integration-data fusion”,and introduces the problems and tasks of each stage.Then,through an example of the government affairs application,the data governance system for government data is designed,which further explains the signi-ficance of multi-source heterogeneous data fusion.In the end,this paper is summarized and future work is prospected.

Key words: Multi-source heterogeneous data, Multi-modal data fusion, Data governance technology, Big data of government affairs, Big data governance process

CLC Number: 

  • TP311
[1]MENG X F,DU Z J.Big Data Fusion Research[J].Computer Research and Development,2016,53(2):231-246.
[2]CHEN Q,ZHANG Z.Research on Data Lake Architecture of Multi-source Heterogeneous Data Governance[J].Journal of Information,2022,41(5):139-145.
[3]HUANG J F,YE P J,WANG M.Data Governance PracticesBased on Big Data Basic Platform[J].Information Technology and Standardization,2022(6):19-23.
[4]ZHANG A,LV N.Research on the Impact of Big Data Capabilities on Government’s Smart Service Performance:Empirical Evidence From China[J].IEEE Access,2021,9:50523-50537.
[5]KHALIL M I,KIM R Y,SEO C.Challenges and Opportunities of Big Data[J].Journal of Platform Technology,2020,8(2):3-9.
[6]OTTO B.Data governance[J].Business & Information Systems Engineering,2011,3(4):241-244.
[7]WU X,ZHU X,WU G,et al.Data mining with big data[J].IEEE Transactions on Knowledge and Data Engineering,2014,26(1):97-107.
[8]YU H,LIANG Z T,YAN Y C.Research progress in multi-source and multi-modal data fusion and integration[J].Intelligence Theory and Practice,2020,43(11):169-178.
[9]LI S F.Research and application of key technologies for spatio-temporal fusion of multi-source heterogeneous data[J].Geospatial Information,2021,19(10):19-21.
[10]WEI Z H,JIA K B,JIA X W.Water extraction from multi-source heterogeneous remote sensing data based on multi-scale feature fusion[J].Remote Sensing Information,2021,36(5):41-48.
[11]ZHANG P F,LI T R,WANG G Q,et al.Multi-source Information Fusion Based on Rough Set Theory:A Review[J].Information Fusion,2021,68:85-117.
[12]FENG Y L,HU J X,DUAN R,et al.Credibility Assessment Method of Sensor Data Based on Multi-Source Heterogeneous Information Fusion[J].Sensors,2021,21(7):2542.
[13]MO H L,ZHENG H F,GAO M,et al.Multi-source heteroge-neous data fusion algorithm based on federated learning[J].Computer Research and Development,2022,59(2):478-487.
[14]BA J M,SHAO P Z,MENG Y Q,et al.Research on Big Data Governance System based on multi-mode fusion[C]//Procee-dings of the 15th National Conference on Signal and Intelligent Information Processing and Application.2022:382-389.
[15]KUMAR K.Integrated benchmarking standard and decisionsupport system for structured,semi structured,unstructured retail data[J].Wireless Networks,2021(Online) DOI:10.1007/s11276-021-02843-4.
[16]LO G P,MUSARELLA L,SOFO G,et al.An approach to extracting complex knowledge patterns among concepts belonging to structured,semi-structured and unstructured sources in a data lake[J].Information Sciences,2019,478:606-626.
[17]TEKLI J.An Overview on XML Semantic Disambiguation from Unstructured Text to Semi-Structured Data:Background,Applications,and Ongoing Challenges[J].IEEE Transactions on Knowledge and Data Engineering,2016,28(6):1383-1407.
[18]DEMETROVICS J,SON HN,GUBAN A.A Formal Representation for Structured Data[J].Acta Polytechnica Hungarica,2016,13(2):59-76.
[19]ZHANG L J,LI N,LI Z H.An Overview on Supervised Semi-structured Data Classification[C]//8th IEEE International Conference on Data Science and Advanced Analytics(DSAA).2021.
[20]CHEN L,SHAO J,YU Z,et al.RAISE:A Whole Process Mo-deling Method for Unstructured Data Management[C]//IEEE First International Conference on Multimedia Big Data.2015:9-12.
[21]YANG D H,LI N N,WANG H Z,et al.Optimization of Parallel Big Data Cleaning Process Based on Task Merging[J].Journal of Computer Science,2016,39(1):97-108.
[22]QI Z X,WANG H Z,ZHOU X,et al.Establishment of Cost-Sensitive Decision Tree on Poor Quality Data[J].Journal of Software,2019,30(3):604-619.
[23]MA Q,GU Y,ZHANG T C,et al.A Heterogeneous Multi-Source Multi-Modal Sensing Data Acquisition Method Based on Data Quality[J].Journal of Computer Science,2013,36(10):2120-2131.
[24]DUAN X L,GUO B,SHEN Y,et al.Data Repair Method Based on Time Rule[J].Journal of Software,2019,30(3):589-603.
[25]REDMAN T C.The impact of poor data quality on the typical enterprise[J].Communications of the ACM,1998,41(2):79-82.
[26]SWARTZ N.Gartner warns firms of ‘dirty data’[J].Information Management,2007,41(3):6.
[27]AGGARWAL C C.Outlier Analysis [C]//Data Ming.Cham,Springer,2015:237-263.
[28]CHU X,ILYAS I F.Qualitative data cleaning.[C]//Procee-dings of the VLDB Endowment.2016:1605-1608.
[29]PARULIAN N N,LUDASCHER B.Towards Transparent Data Cleaning:The Data Cleaning Model Explorer(DCM/X) [C]//21st ACM/IEEE Joint Conference on Digital Libraries(JCDL).2021:326-327.
[30]HUA M,PEI J.Cleaning disguised missing data:A heuristic approach[C]//Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD 2007).New York:ACM Press,2007:950-958.
[31]ELMAGARMID A K,IPEIROTIS P G,VERYKIOS V S.Duplicate record detection:A survey[J].IEEE Transactions on Knowledge and Data Engineering,2007,19(1):1-16.
[32]WU X D,DONG B B,DU X Z,et al.Data Governance Techniques[J].Journal of Software,2019,30(9):2830-2856.
[33]VETOVA S.Big Heterogeneous Data Integration and Analysis[C]//46th International Conference on Applications of Mathematics in Engineering and Economics(AMEE).2021.
[34]LI J K,WANG Y Z,LI Z A.DM_Integration:A framework for iterative large volume data integration[C]//1st International Symposium on Data,Privacy and E-Commerce(ISDPE 2007).2007:68-73.
[35]CHEN X P,YANG J,ZHANG F W,et al.Design of Data Integration Platform Based On Relational Database[C]//International Conference on Management Science and Engineering.2010:257-260.
[36]MALESZKA M,NGUYEN N T.A Method for Complex Hie-rarchical Data Integration[J].Cybernetics and Systems,2011,42(5):358-378.
[37]LIU G L,YANG C H.A Method of Data Integration Based on Cloud [C]//2nd International Conference on Mechatronics and Control Engineering(ICMCE 2013).2013,:1876-1879.
[38]JARKE M,JEUSFELD M,QUIX C.Data-centric intelligent information integration-from concepts to automation[J].Journal of Intelligent Information Systems,2014,43(3):437-462.
[39]GRABIS J,KAMPARS J.On-demand Data Integration for Decision-making Applications[C]//15th International Conference on Enterprise Information Systems(ICEIS).2013:201-208.
[40]ZHOU B.Data Integration as a Service for Applications Deployment on the SaaS Platform[C]//6th International Conference on Biomedical Engineering and Informatics(BMEI).2013:672-676.
[41]SAZONTEV V,STUPNIKOV S.An Extensible Approach for Materialized Big Data Integration in Distributed Computation Environments[C]//Ivannikov Memorial Workshop(IVMEM).2019:33-38.
[42]HE H,MENG W,YU C,et al.Constructing interface schemas for search interfaces of web databases[C]//Proceedings of the International Conference on Web Information Systems Engineering.New York:Springer-Verlag,2005:29-42.
[43]CHAUDHURI S,DAYAL U.An overview of data warehousing and OLAP technology[J].ACM Sigmod Record,1997,26(1):65-74.
[44]BENEDIKT M,GRAU B C,KOSTYLEV E V.Logical foundations of information disclosure in ontology-based data integration[J].Artificial Intelligence,2018,262(2018):52-95.
[45]REN Z Y,WANG Z C,KE Z W,et al.A review of multimodal data fusion[J].Computer Engineering and Applications,2021,57(18):49-64.
[46]SUN Y Y,JIA Z T,ZHU H Y.A review of multimodal deep learning[J].Computer Engineering and Applications,2020,56(21):1-10.
[47]LIU J W,DING X H,LUO X L.A review of multimodal deep learning[J].Computer application Research,2020,37(6):1601-1614.
[48]CHEN Y,ZHOU G,LU J C.A review on the construction and application of multimodal knowledge graph[J].Computer application Research,2021,38(12):3535-3543.
[49]ATREY P K,HOSSAIN M A,SADDIK A.Multimodal fusion for multimedia analysis:a survey[J].Multimedia Systems,2010,16(6):345-379.
[50]YEH Y R,LIN T C,CHUNG Y Y.A novel multiple kernel learning framework for heterogeneous feature fusion and variable selection[J].IEEE Transactions on Multimedia,2012,14(3):563-574.
[51]ZHENG Y,LIU Y,YUAN J,et al.Urban computing with taxicabs[C]//Proceedings of the 13th ACM International Confe-rence on Ubiquitous Computing.2011.
[52]LIU W,ZHANG C,YU B.A general multi-source data fusion framework[C]//Proceedings of the 2019 11th International Conference on Machine Learning and Computing.2019.
[53]LIU Z K,LIU H P,HUANG W M.Visual and auditory cross-modal surface material retrieval[J].Journal of Intelligent Systems,2019,14(3):423-429.
[54]YU Y,LIN H,MENG J,et al.Visual and textual sentimentanalysis of a microblog using deep convolutional neural networks [J].Algorithms,2016,9(2):41.
[55]NI J,MA X,XU L.An image recognition method based on multiple BP neural networks fusion[C]//International Conference on Information Acquisition.2004:323-326.
[56]ZHANG J X,LI S Y.Air quality index forecast in Beijing based on CNN-LSTM multi-model[J].Chemosphere,2022,308(1):136-180.
[57]CHENG D L,ZHANG D W,CHEN Y X.A review of multimodal emotion recognition[J].Journal of Southwest University for Nationalities(Natural Science Edition),2022,48(4):440-447.
[58]MU Z J,FU Y R.A review of multimodal learning analysis[J].Modern Educational Technology,2021,31(6):23-31.
[59]ZHOU X M,HU Y G,LIU W J,et al.Research on Urban function Identification based on multi-modal and multi-level data fusion method[J].Computer Science,2021,48(9):50-58.
[60]LIU J,LI T,XIE P.Urban big data fusion based on deep lear-ning:an overview[J].Information Fusion,2020,53:123-133.
[61]RAMACHANDRAM D,TAYLOR G W.Deep multimodallearning a survey on recent advances and trends[J].IEEE Signal Processing Magazine,2017,11(13):96-108.
[62]VIELZEUF V,LECHERVY A,PATEUX S.Centralnet:a multilayer approach for multimodal fusion[C]//Proceedings of the European Conference on Computer Vision(ECCV)Workshops.2018.
[63]LIU S,WANG G R,LI Y L,et al.A review of Chinese short text classification techniques[J].Journal of Information Engineering University,2021,22(3):304-312.
[64]CHO K,MERRIENBOER B V,GULCEHRE C.Learningphrase representations using RNN encoder -decoder for statistical machine translation[C]//Proceedings of Conference on Empirical Methods in Natural Language Processing.Stroudsburg,PA:Association for Computational Linguistics.2014:1724-1734.
[65]SUTSKEVER I,VINYALS O,LE Q V.Sequence to sequence learning with neural networks[C]//Proceedings of Annual Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2014:3104-3112.
[66]MURPHY R R.Computer vision and machine learning inscience fiction[J].Science Robotics,2019,4(30).
[67]YANG Z,HE X,GAO J.Stacked attention networks for image question answering[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2016:21-29.
[68]KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNetclassification with deep convolutional neural networks[C]//Proceedings of the 26th Annual Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2012:1106-1114.
[69]SIMONYAN K,ZISSERMAN A.Two-stream convolutionalnetworks for action recognition in videos[C]//Proceedings of Annual Conference on Neural Information Processing Systems.Cambridge,MA:MIT Press,2014:568-576.
[70]LEE W,KIM J,LEE N K.Pornographic Video Detection with Convolutional Two-Stream Network Fusion[C]//11th International Conference on Information and Communication Technology Convergence(ICTC).2020:1273-1275.
[71]JIANG X,WU F,ZHANG Y.The classification of multi-modal data with hidden conditional random field[J].Pattern Recognition Letters,2015,51:63-69.
[72]NEFIAN A V,LIANG L,PI X.Dynamic Bayesian networks for audio-visual speech recognition [J].EURASIP Journal on Advances in Signal Processing,2002(11):1-15.
[73]SHI Q P.Audiovisual speech recognition system based on HMM[D].Nanjing:Hehai University,2011.
[74]LIU Y,LIU L,GUO Y.Learning visual and textual representations for multimodal matching and classification[J].Pattern Recognition,2018,84(12):51-67.
[75]BALTRUŠAITIS T,AHUJA C,MORENCY L P.Multimodal machine learning:A survey and taxonomy[J].IEEE Trans on Pattern Analysis and Machine Intelligence,2018,41(2):423-443.
[76]ZADEH A,LIANG P P,MAZUMDER N.Memory fusion network for multi-view sequential learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018.
[77]HUANG P S,HE X,GAO J.Learning deep structured semantic models for web search using click-through data[C]//Procee-dings of the 22nd ACM International Conference on Information &Knowledge Management.2013:2333-2338.
[78]WU A,HAN Y.Multi-modal circulant fusion for video-to-language and backward[C]//International Joint Conference on Artificial Intelligence.2018.
[79]WANG J X,FANG S N,FENG Y X,et al.Design and implementation of intelligent police protection platform based on big data[J].Police Technology,2021(3):39-42.
[80]XU H M,DENG H C.Research on key technologies and applications of government big data Center construction[J].Smart Buildings and Smart Cities,2022,(7):94-96.
[81]LI G L,ZHOU X H.An overview of AI-oriented data management techniques[J].Journal of Software,2021,32(1):21-40.
[1] CUI Bingjing, ZHANG Yipu, WANG Biao. Multimodal Data Fusion Algorithm Based on Hypergraph Regularization [J]. Computer Science, 2023, 50(6): 167-174.
[2] ZHOU Xin-min, HU Yi-gui, LIU Wen-jie, SUN Rong-jun. Research on Urban Function Recognition Based on Multi-modal and Multi-level Data Fusion Method [J]. Computer Science, 2021, 48(9): 50-58.
[3] LU Jia-wei, WANG Chen-hao, XIAO Gang and XU Jun. Research and Application of Cloud Push Platform Based on Multi-source and Heterogeneous Data [J]. Computer Science, 2016, 43(Z6): 533-537.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!