计算机科学 ›› 2025, Vol. 52 ›› Issue (6A): 240400074-7.doi: 10.11896/jsjkx.240400074
王庆, 杨万哲, 张聪
WANG Qing, YANG Wanzhe, ZHANG Cong
摘要: 数据字典(Data Dictionary,DD)是数据库系统设计内容的重要组成部分,是描述数据库中各数据属性、组成和结构的数据列表集合。一些通用性信息化系统开发过程中,设计开发人员经常遇到如何融合优化既有异构数据字典的问题,这些既有数据字典因设计时缺少行业数据标准或业务范围局限性,在数据表征定义和数据组成及结构设计上差异化明显,但其数据内涵具有高度可融合性,需要花费大量时间和资源通过人工来维护融合数据字典。文中以基层社会网格治理业务背景,针对基层社会治理推广数字化应用开发中异构数据字典融合的痛点问题,研究异构数据字典融合优化方法及相关技术;设计了考虑数据信息完备性和数据结构完整性的数据字典语义去重消岐、关键词提取、相似度计算、数据字典表结构融合方法等4个方面的数据字典融合方法和技术。基于基层社会网格治理业务相关数据字典融合优化实验验证,相较于传统的数据字典融合方法显著提升了融合效率和效果。
中图分类号:
[1]YVETTE A.Describing businesses with data dictionaries [J].Data Processing,1984,26(6):17-19. [2]JULIA V D.Data dictionaries as a tool to greater productivity [J].Data Processing,1984,26(6):14-16. [3]SHAMKANT B N,LARRY K.Role of data dictionaries in information resource management [J].Information & Management,1986,10(1):21-46. [4]FIORA P,CLARA P.Explaining incompatibilities in data dic-tionary design through abduction [J].Data & Knowledge Engineering,1994,13(2):101-139. [5]ANDREW D.ARENSON.Implementation of a shared data repository and common data dictionary for fetal alcohol spectrum disorders research [J].Alcohol,2010,44(7/8):643-647. [6]CATHERINE L,ICHIRO F.Metadata Data Dictionary for Analog Sound Recordings [C]//Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries(JCDL’06).2006:344. [7]ALEXANDEROS B,IOANNIS K,VANA K.Dictionary datastructures for smartphone devices[C]//Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments.2012:1-4. [8]UDO D,ROLAND H.Cyber-physical system description model[J].Chinese instrument,2013(10):41-47. [9]HUANG C H,YIN J,HOU F.A text similarity measurement method combining lexical semantic information and TF-IDF method[J].Journal of Computer Science,2011,34(5):856-864. [10]LI M T,LUO J Y,YIN M J.A method to calculate the weightof text feature words combined with their meaning[J].Compu-ter Application,2012,32(5):1355-1358,1365. [11]ZHAN Z J,LAING L N,YANG X P.Word similarity calculation based on Baidu Encyclopedia[J].Computer Science,2013,40(6):199-202. [12]WANG Z Z,HE M,DU Y P.Text similarity calculation based on LDA topic model[J].Computer Science,2013,40(12):229-232. [13]ZHANG H Y,LIU D B,WEN C Y.Research on word semantic similarity improvement algorithm based on Knownet [J].Computer Engineering,2015(2):151-156. [14]XIN Y F,FU Y X,MA L.Short text classification based on frequent item feature extension[J].Computer Science,2019,46(z1):478-481. [15]WANG H L.Predicts 2023:Synonyms [EB/OL].(2017-09-27) [2023-11-22].https://github.com/huyingxi/Synonyms/doc. [16]LIU G Z,ZHANG J H,WANG H D.The application of TF-IDF algorithm in e-commerce simulation training platform is improved [J].Computer Simulation,2023,40(7):273-277. [17]HUANG L,WU Y P,ZHU F Q.Research and improvement of automatic keyword extraction method[J].Computer Science,2014,41(6):204-207. [18]WANG J.LI X J.Improved TFIDF label extraction algorithm [J].Software Engineering,2018,21(2):4-6. [19]GRAVANO L,IPEIROFIS P G,JAGADISH H V.Approxi-mateString Joins in a Database[C]//Proceedings of the 27th International Conference on Very Large Data Bases.2001:491-500. [20]SONDIK E J.The optimal control of partially observableMarkov processes over the infinite horizon:discountedcosts [J].Opera-tions Research,1978,26(6):282-304. [21]SYAROFINA S,BUSTAMAM A,YANUSRA,et al.The distance function approach on the MiniBatchKMeans algorithm for the DPP-4 inhibitors on thediscovery of type 2 diabetes drugs [J].Procedia Computer Science,2021(179):127-134. |
|