计算机科学 ›› 2024, Vol. 51 ›› Issue (11A): 240100095-7.doi: 10.11896/jsjkx.240100095

• 大数据&数据科学 • 上一篇    下一篇

面向回收信息的线上线下多源异构数据融合系统

仇明鑫, 雷帅, 柳先辉, 张颖瑶   

  1. 同济大学电子与信息工程学院 上海 201804
  • 出版日期:2024-11-16 发布日期:2024-11-13
  • 通讯作者: 张颖瑶(zhangyingyao@tongji.edu.cn)
  • 作者简介:(2233091@tongji.edu.cn)
  • 基金资助:
    国家重点研发计划(2022YFB3305802)

Online and Offline Multi-source Heterogeneous Data Fusion System for Recycling Information

QIU Mingxin, LEI Shuai, LIU Xianhui, ZHANG Yingyao   

  1. School of Electronics and Information Engineering,Tongji University,Shanghai 201804,China
  • Online:2024-11-16 Published:2024-11-13
  • About author:QIU Mingxin,born in 2000,postgra-duate.His main research interests include machine learning and big data.
    ZHANG Yinyao,born in 1984,Ph.D,associate professor.Her main research interests include machine learning and big data.
  • Supported by:
    Key Research and Development Program of China(2022YFB3305802).

摘要: 资源循环利用产业的废旧产品回收过程中多系统协同工作会产生大量多源异构数据,针对废旧产品线上线下回收信息难以融合并有效利用的问题,提出了一种面向回收信息的线上线下多源异构数据融合系统。首先,系统采用Web API接口实现线上线下多源异构数据的数据接入,通过数据解析、数据清洗及数据转换等步骤完成对多源异构数据的预处理。其次,针对现有基于聚类分析的数据融合方法在融合过程中往往还需预先指定聚类簇数的问题,提出了一种基于多目标聚类的融合方法,以在融合过程中自动确定聚类簇数。通过对预处理后的数据进行特征选择、标签编码、数据转换和归一化处理,结合多目标聚类算法完成对部分典型数据的特征提取与聚类,并对全量及增量数据进行基于欧氏距离的数据匹配。最后,系统采用了基于MyCat中间件及MySQL主从复制的分布式数据库方案,以实现融合数据的存储与共享交换。测试表明,该数据融合系统可以实现对废旧产品线上线下多源异构回收信息的数据融合及共享交换,同时,相比基于K-Means的数据融合方法,所提出的基于多目标聚类的数据融合方法在不同数据集上都能够自动确定最优聚类簇数,并且能够获得不差于K-Means融合方法的簇内紧密性和簇间分离性。

关键词: 聚类, 多目标优化, 多源异构数据, 数据融合

Abstract: In the recycling process of waste products in the resource recycling industry,a large number of multi-source hetero-geneous data will be generated due to the collaborative work of multiple systems.Aiming at the problem that the online and offline recycling information of waste products is difficult to fuse and effectively use,an online and offline multi-source heteroge-neous data fusion system for recycling information is proposed.Firstly,the system uses the Web API to realize the data access of online and offline multi-source heterogeneous data,and completes the pretreatment of it through the steps of data parsing,data cleaning and data conversion.Secondly,aiming at the problem that the existing data fusion methods based on clustering analysis usually need to specify the number of clusters in advance in the fusion process,a fusion method based on multi-objective clustering is proposed,which aims to automatically determine the number of clusters in the fusion process.Through feature selection,label co-ding,data conversion and normalization of the preprocessed data,combined with the multi-objective clustering algorithm,feature extraction and clustering of typical data is completed,and data matching based on Euclidean distance is performed for the total and incremental data.Finally,the system uses a distributed database scheme based on MyCat middleware and MySQL master-slave replication to realize the storage,sharing and exchange of fusion data.The test shows that the data fusion system can rea-lize the data fusion,sharing and exchange of online and offline multi-source heterogeneous recycling information of waste pro-ducts.At the same time,compared to the method based on K-Means,the proposed data fusion method based on multi-objective clustering can automatically determine the optimal cluster number on different data sets,and can obtain the compactness and separation no worse than that of the K-Means fusion method.

Key words: Clustering, Multi-objective optimization, Multi-source heterogeneous data, Data fusion

中图分类号: 

  • TP391
[1]DU H Z,LV Z,SONG S W,et al.The Development Trends of International Resource Recycling Industry and China's Response during the 14th Five Year Plan Period under the “Double Carbon” Goal [J].Macroeconomic Research,2022(7):120-128.
[2]XIA W,CAI W T,LIU Y,et al.Multi source heterogeneous da-ta fusion in distribution networks based on joint Kalman filtering [J].Power System Protection and Control,2022,50(10):180-187.
[3]LI W,WEI D Y,LU Y,et al.Research on Vehicle Autonomous Location Method Based on Heterogeneous Feature Information Matching [J].Navigation Location and Timing,2019(3):75-81.
[4]LIN Y,CHEN R C,JIN T.Multi source heterogeneous data fusion technology for complex information systems [J].China Testing,2020,46(7):1-7,23.
[5]KU X B,ZHANG H L,YANG S.Data Management Platform of Smart Community Based on XML Format Fusion of Multi-Source Heterogeneous Data [J].Electric Power Survey and Design,2023(8):1-5,17.
[6]LI L,WANG W.Network heterogeneous information integrated management system based on improved RNN multi-source fusion algorithm [J].Journal of Xi'an University of Engineering,2023,37(6):145-152.
[7]TAN J D,LI B,LIU C Y,et al.Research on the fusion proces-sing method of multi-source electromechanical state data for highways [J].Highway,2023,68(8):275-281.
[8]ALHGAISH A,ALZYADAT W,AL-FAYOUMI M,et al.Preserve quality medical drug data toward meaningful data lake by cluster[J].International Journal of Recent Technology and Engineering,2019,8(3):270-277.
[9]HUI G B.A deep learning based multi-source heterogeneous data fusion method [J].Modern Navigation,2017,8(3):218-223.
[10]HANDL J,KNOWLES J.Multi-objective clustering and cluster validation[J].Multi-objective Machine Learning,2006,16(21):21-47.
[11]JOSÉ-GARCÍA A,GÓMEZ-FLORES W.Automatic clustering using nature-inspired metaheuristics:A survey[J].Applied Soft Computing,2016,41:192-213.
[12]HANDL J,KNOWLES J.Evidence accumulation in multiobjective data clustering[C]//International Conference on Evolutio-nary Multi-Criterion Optimization.Berlin,Heidelberg:Springer Berlin Heidelberg,2013:543-557.
[13]BANDYOPADHYAY S,MUKHOPADHYAY A,MAULIKU.An improved algorithm for clustering gene expression data[J].Bioinformatics,2007,23(21):2859-2865.
[14]FACELI K,DE SOUTO M C P,DE ARAUJOD S A,et al.Multi-objective clustering ensemble for gene expression data analysis[J].Neurocomputing,2009,72(13/14/15):2763-2774.
[15]MUKHOPADHYAY A,MAULIK U,BANDYOPADHYAY S.An interactive approach to multiobjective clustering of gene expression patterns[J].IEEE Transactions on Biomedical Engineering,2012,60(1):35-41.
[16]MAULIK U,MUKHOPADHYAY A,BANDYOPADHYAYS.Combining pareto-optimal clusters using supervised learning for identifying co-expressed genes[J].BMC Bioinformatics,2009,10(1):1-16.
[17]GUPTA A,ONG Y S,FENG L.Multifactorial evolution:toward evolutionary multitasking[J].IEEE Transactions on Evolutio-nary Computation,2015,20(3):343-357.
[18]OMIDVAR M N,LI X,MEI Y,et al.Cooperative co-evolution with differential grouping for large scale optimization[J].IEEE Transactions on Evolutionary Computation,2013,18(3):378-393.
[19]WANG R,LAI S,WU G,et al.Multi-clustering via evolutionary multi-objective optimization[J].Information Sciences,2018,450:128-140.
[20]DEB K,PRATAP A,AGARWAL S,et al.A fast and elitist multiobjective genetic algorithm:NSGA-II[J].IEEE Transactions on Evolutionary Computation,2002,6(2):182-197.
[21]SRINIVAS N,DEB K.Muiltiobjective optimization using nondominated sorting in genetic algorithms[J].Evolutionary Computation,1994,2(3):221-248.
[22]HANCER E,KARABOGA D.A comprehensive survey of traditional,merge-split and evolutionary approaches proposed for determination of cluster number[J].Swarm and Evolutionary Computation,2017,32:49-67.
[23]MEI W J,ZHENG J,JIN J,et al.Multi sensor asynchronous information fusion method based on sliding clustering [J].Journal of Instrumentation,2022,43(6):109-117.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!