计算机科学 ›› 2021, Vol. 48 ›› Issue (3): 174-179.doi: 10.11896/jsjkx.191200154
张寒烁, 杨冬菊
ZHANG Han-shuo, YANG Dong-ju
摘要: 随着科技数据量的不断增长,各科技部门积累了大量科技项目的科技管理数据。对于大量结构化数据,需要针对分散数据进行整理、分析,最终按需求提供数据查询与抽取服务。由于在关系数据库中关联关系的分析效果不佳,为了提高分析效率,文中引入了关系图谱进行数据处理。首先,提出了一种基于词频的实体搜索与定位算法来提取实体和关系,并构建关系图谱;其次,对关系图谱进行分析,提出了一种基于改进FP-growth的图数据频繁项挖掘算法;然后,设计了基于图数据的数据筛选流程,进行数据的筛选、分析,并定义评分矩阵,对待筛选数据情况进行评价,最终给出分析意见,且数据筛选的评价标准可以进行自定义;最后,结合构建的关系图谱,将算法进行了实际应用,并将其封装为服务。实验结果表明,提出的基于改进FP-growth的频繁项挖掘算法相比传统FP-growth算法在时间上有10%~12%的提升,数据筛选过程的准确率达到97%左右。
中图分类号:
[1]XU F.Research on spam speech recognition based on user social relationship graph [D].Wuhan:Huazhong University of Science and Technology,2017. [2]AMIT S.Introducing the Knowledge Graph:Things,NotStrings,Official Blog [OL].[2019-06-14].http://googleblog.blogspot.co.uk/. [3]TANG Y,CHEN G H,HE C B,et al.Knowledge Map and Its Application in the Field of Academic Information Services [J].Journal of South China Normal University(Natural Science Edition),2018,50(5):110-119. [4]LING X,WELD D S.Fine-grained entity recognition[C]//Proc of the 26th Conf on Association for the Advancement of Artificial Intelligence.Menlo Park,CA:AAAI,2012:94-100. [5]YIN L,YUAN F,XIE W B,et al.Research Progress and Challenges of Correlation Maps[J].Computer Science,2018,45(S1):1-10,35. [6]JIANG B C,WAN G,XU J,et al.Construction of large-scale geo-graphic knowledge maps of multi-source heterogeneous data[J].Journal of Surveying and Mapping,2018,47(8):1051-1061. [7]YAN J H,WANG C Y,CHENG W L,et al.A retrospective of knowledge graphs [J].Frontiers of Computer Science,2018,12(1):55-74. [8]NATHAN E,BADER D A.Incrementally updating Katz centrality in dynamic graphs(Article)[J].Social Network Analysis and Mining,2018,8(1):1-26. [9]LI X,TUR G,HAKKANI-TUR D,et al.Personal knowledgegraph population from user utterances in conversational understanding[C]//Spoken Language Technology Workshop.IEEE,2015. [10]YU J,LIU Y B,ZHANG Y,et al.Overview of Large ScaleGraph Data Matching Technology[J].Journal of Computer Research and Development,2015,52(2):391-409. [11]ZHANG L X,WANG W P,GAO J L,et al.Incremental Graph Pattern Matching for Pattern Graph Changes[J].Journal of Software,2015,26(11):2964-2980. [12]GUAN J,WANG W,QI Q H.Multi-Keyword Streaming Parallel Retrieval Algorithm Based on Urban Security Knowledge Map[J].Computer Science,2019,46(2):35-41. [13]SUN W P,CHANG L,BIN C Z,et al.Recommendations ofTourism Routes Based on Knowledge Mapping and Frequent Sequence Mining[J].Computer Science,2019,46(2):56-61. [14]ZHAO Z B,JIA Y F,YAO L,et al.Research on Web Page Classification Technology with Rich Structured Data[J].Journal of Computer Research and Development,2013,50(S1):53-60. [15]ZHANG Y,JIA Y D,FU L Y,et al.AceMap Academic Map and AceKG Academic Knowledge Atlas——Visualization of Academic Data [J].Journal of Shanghai Jiaotong University,2018,52(10):1357-1362. [16]ZHENG W G,CHENG H Y,XU J,et al.Interactive natural language question answering over knowledge graphs[J].Information Sciences,2019,481:141-159. [17]SHI D X,LI H,YANG R S,et al.Excavation of daily frequent behavior patterns of users[J].Journal of National University of Defense Technology,2017,39(1):74-80. [18]FADER A,SODERLAND S,ETZIONI O.Identifying relations for Open information extraction[C]//Proc. of the Conf. on Empirical Methods in Natural Language Processing.Stroudsburg,PA:ACL,2011:1535-1545. |
[1] | 黎嵘繁, 钟婷, 吴劲, 周帆, 匡平. 基于时空注意力克里金的边坡形变数据插值方法 Spatio-Temporal Attention-based Kriging for Land Deformation Data Interpolation 计算机科学, 2022, 49(8): 33-39. https://doi.org/10.11896/jsjkx.210600161 |
[2] | 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓. 一种可快速迁移的领域知识图谱构建方法 Fast and Transmissible Domain Knowledge Graph Construction Method 计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018 |
[3] | 丛颖男, 王兆毓, 朱金清. 关于法律人工智能数据和算法问题的若干思考 Insights into Dataset and Algorithm Related Problems in Artificial Intelligence for Law 计算机科学, 2022, 49(4): 74-79. https://doi.org/10.11896/jsjkx.210900191 |
[4] | 么晓明, 丁世昌, 赵涛, 黄宏, 罗家德, 傅晓明. 大数据驱动的社会经济地位分析研究综述 Big Data-driven Based Socioeconomic Status Analysis:A Survey 计算机科学, 2022, 49(4): 80-87. https://doi.org/10.11896/jsjkx.211100014 |
[5] | 孔钰婷, 谭富祥, 赵鑫, 张正航, 白璐, 钱育蓉. 基于差分隐私的K-means算法优化研究综述 Review of K-means Algorithm Optimization Based on Differential Privacy 计算机科学, 2022, 49(2): 162-173. https://doi.org/10.11896/jsjkx.201200008 |
[6] | 梁静茹, 鄂海红, 宋美娜. 基于属性图模型的领域知识图谱构建方法 Method of Domain Knowledge Graph Construction Based on Property Graph Model 计算机科学, 2022, 49(2): 174-181. https://doi.org/10.11896/jsjkx.210500076 |
[7] | 马董, 李新源, 陈红梅, 肖清. 星型高影响的空间co-location模式挖掘 Mining Spatial co-location Patterns with Star High Influence 计算机科学, 2022, 49(1): 166-174. https://doi.org/10.11896/jsjkx.201000186 |
[8] | 江昊琛, 魏子麒, 刘璘, 陈俊. 非均衡数据分类经典方法综述与面向医疗领域的实验分析 Imbalanced Data Classification:A Survey and Experiments in Medical Domain 计算机科学, 2022, 49(1): 80-88. https://doi.org/10.11896/jsjkx.210200124 |
[9] | 张亚迪, 孙悦, 刘锋, 朱二周. 结合密度参数与中心替换的改进K-means算法及新聚类有效性指标研究 Study on Density Parameter and Center-Replacement Combined K-means and New Clustering Validity Index 计算机科学, 2022, 49(1): 121-132. https://doi.org/10.11896/jsjkx.201100148 |
[10] | 余乐章, 夏天宇, 荆一楠, 何震瀛, 王晓阳. 面向大数据分析的智能交互向导系统 Smart Interactive Guide System for Big Data Analytics 计算机科学, 2021, 48(9): 110-117. https://doi.org/10.11896/jsjkx.200900083 |
[11] | 徐慧慧, 晏华. 基于相对危险度的儿童先心病风险因素分析算法 Relative Risk Degree Based Risk Factor Analysis Algorithm for Congenital Heart Disease in Children 计算机科学, 2021, 48(6): 210-214. https://doi.org/10.11896/jsjkx.200500082 |
[12] | 吴广智, 郭斌, 丁亚三, 成家慧, 於志文. 假消息认知机理研究综述 Cognitive Mechanisms of Fake News 计算机科学, 2021, 48(6): 306-314. https://doi.org/10.11896/jsjkx.201200194 |
[13] | 张岩金, 白亮. 一种基于符号关系图的快速符号数据聚类算法 Fast Symbolic Data Clustering Algorithm Based on Symbolic Relation Graph 计算机科学, 2021, 48(4): 111-116. https://doi.org/10.11896/jsjkx.200800011 |
[14] | 邹承明, 陈德. 高维大数据分析的无监督异常检测方法 Unsupervised Anomaly Detection Method for High-dimensional Big Data Analysis 计算机科学, 2021, 48(2): 121-127. https://doi.org/10.11896/jsjkx.191100141 |
[15] | 胡腾, 王艳平, 张小松, 牛伟纳. 基于区块链的DApp数据与行为分析 Data and Behavior Analysis of Blockchain-based DApp 计算机科学, 2021, 48(11): 116-123. https://doi.org/10.11896/jsjkx.210200134 |
|