计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240400074-7.doi: 10.11896/jsjkx.240400074

• 大数据&数据科学 • 上一篇    下一篇

基层社会网格治理异构数据字典融合优化方法研究

王庆, 杨万哲, 张聪   

  1. 东北大学信息科学与工程学院 沈阳 110000
  • 出版日期:2025-06-16 发布日期:2025-06-12
  • 通讯作者: 王庆(wangqing@ise.neu.edu.cn)
  • 基金资助:
    国家重点研发计划(2021YFC3300300)

Research on Fusion Optimization Method of Heterogeneous Data Dictionary in Grass-roots SocialGrid Governance

WANG Qing, YANG Wanzhe, ZHANG Cong   

  1. College of Information Science and Engineering,Northeastern University,Shenyang 110000,China
  • Online:2025-06-16 Published:2025-06-12
  • About author:WANG Qing,born in 1969,Ph.D,associate professor.His main research interests include modeling and optimization,manufacturing and service planning and sche-duling,logistics & supply chain resources planning,e-Commerce business optimization,intelligent optimization algorithm.
  • Supported by:
    National Key Research and Development Program of China(2021YFC3300300).

摘要: 数据字典(Data Dictionary,DD)是数据库系统设计内容的重要组成部分,是描述数据库中各数据属性、组成和结构的数据列表集合。一些通用性信息化系统开发过程中,设计开发人员经常遇到如何融合优化既有异构数据字典的问题,这些既有数据字典因设计时缺少行业数据标准或业务范围局限性,在数据表征定义和数据组成及结构设计上差异化明显,但其数据内涵具有高度可融合性,需要花费大量时间和资源通过人工来维护融合数据字典。文中以基层社会网格治理业务背景,针对基层社会治理推广数字化应用开发中异构数据字典融合的痛点问题,研究异构数据字典融合优化方法及相关技术;设计了考虑数据信息完备性和数据结构完整性的数据字典语义去重消岐、关键词提取、相似度计算、数据字典表结构融合方法等4个方面的数据字典融合方法和技术。基于基层社会网格治理业务相关数据字典融合优化实验验证,相较于传统的数据字典融合方法显著提升了融合效率和效果。

关键词: 数据字典, 数据库设计, 编辑距离, 相似度计算, 基层社会网格治理

Abstract: Data dictionary(DD) is an important part of the database system design content,and it is a collection of data lists that describes the attributes,composition and structure of the data in the database.In the development process of some general-purpose information systems,designers and developers often encounter the problem of how to integrate and optimize existing heterogeneous data dictionaries.Due to the lack of industry data standards or business scope limitations,these existing data dictionaries differ significantly in data representation definition,data composition and structure design,but their data content is highly convergable.It takes a lot of time and resources to manually maintain a converged data dictionary.Based on the business background of grass-roots social grid governance,this paper aims at the pain points of heterogeneous data dictionary fusion in the development of grass-roots social governance promotion digital application,and studies the optimization methods and related technologies of he-terogeneous data dictionary fusion.The methods and techniques of data dictionary fusion are designed,which consider the completeness of data information and the integrity of data structure,such as semantic deduplication and disambiguation,keyword extraction,similarity calculation and table structure fusion.Based on the experimental verification of data dictionary fusion optimization of grass-roots social grid governance business,the fusion efficiency and effect are significantly improved compared with the traditional data dictionary fusion method.

Key words: Data dictionary, Database design, Edit distance, Similarity calculation, Grass-roots social grid governance

中图分类号: 

  • TP392
[1]YVETTE A.Describing businesses with data dictionaries [J].Data Processing,1984,26(6):17-19.
[2]JULIA V D.Data dictionaries as a tool to greater productivity [J].Data Processing,1984,26(6):14-16.
[3]SHAMKANT B N,LARRY K.Role of data dictionaries in information resource management [J].Information & Management,1986,10(1):21-46.
[4]FIORA P,CLARA P.Explaining incompatibilities in data dic-tionary design through abduction [J].Data & Knowledge Engineering,1994,13(2):101-139.
[5]ANDREW D.ARENSON.Implementation of a shared data repository and common data dictionary for fetal alcohol spectrum disorders research [J].Alcohol,2010,44(7/8):643-647.
[6]CATHERINE L,ICHIRO F.Metadata Data Dictionary for Analog Sound Recordings [C]//Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries(JCDL’06).2006:344.
[7]ALEXANDEROS B,IOANNIS K,VANA K.Dictionary datastructures for smartphone devices[C]//Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments.2012:1-4.
[8]UDO D,ROLAND H.Cyber-physical system description model[J].Chinese instrument,2013(10):41-47.
[9]HUANG C H,YIN J,HOU F.A text similarity measurement method combining lexical semantic information and TF-IDF method[J].Journal of Computer Science,2011,34(5):856-864.
[10]LI M T,LUO J Y,YIN M J.A method to calculate the weightof text feature words combined with their meaning[J].Compu-ter Application,2012,32(5):1355-1358,1365.
[11]ZHAN Z J,LAING L N,YANG X P.Word similarity calculation based on Baidu Encyclopedia[J].Computer Science,2013,40(6):199-202.
[12]WANG Z Z,HE M,DU Y P.Text similarity calculation based on LDA topic model[J].Computer Science,2013,40(12):229-232.
[13]ZHANG H Y,LIU D B,WEN C Y.Research on word semantic similarity improvement algorithm based on Knownet [J].Computer Engineering,2015(2):151-156.
[14]XIN Y F,FU Y X,MA L.Short text classification based on frequent item feature extension[J].Computer Science,2019,46(z1):478-481.
[15]WANG H L.Predicts 2023:Synonyms [EB/OL].(2017-09-27) [2023-11-22].https://github.com/huyingxi/Synonyms/doc.
[16]LIU G Z,ZHANG J H,WANG H D.The application of TF-IDF algorithm in e-commerce simulation training platform is improved [J].Computer Simulation,2023,40(7):273-277.
[17]HUANG L,WU Y P,ZHU F Q.Research and improvement of automatic keyword extraction method[J].Computer Science,2014,41(6):204-207.
[18]WANG J.LI X J.Improved TFIDF label extraction algorithm [J].Software Engineering,2018,21(2):4-6.
[19]GRAVANO L,IPEIROFIS P G,JAGADISH H V.Approxi-mateString Joins in a Database[C]//Proceedings of the 27th International Conference on Very Large Data Bases.2001:491-500.
[20]SONDIK E J.The optimal control of partially observableMarkov processes over the infinite horizon:discountedcosts [J].Opera-tions Research,1978,26(6):282-304.
[21]SYAROFINA S,BUSTAMAM A,YANUSRA,et al.The distance function approach on the MiniBatchKMeans algorithm for the DPP-4 inhibitors on thediscovery of type 2 diabetes drugs [J].Procedia Computer Science,2021(179):127-134.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!