计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 429-435.doi: 10.11896/JsJkx.190700161

• 数据库 & 大数据 & 数据科学 • 上一篇    下一篇

改进的GHSOM算法在民航航空法规知识地图构建中的应用

张浩洋, 周良   

  1. 南京航空航天大学计算机科学与技术学院 南京 211100
  • 发布日期:2020-07-07
  • 通讯作者: 张浩洋(zhang405002@163.com)

Application of Improved GHSOM Algorithm in Civil Aviation Regulation Knowledge Map Construction

ZHANG Hao-yang and ZHOU Liang   

  1. School of Computer Science and Technology,NanJing University of Aeronautics and Astronautics,NanJing 211100,China
  • Published:2020-07-07
  • About author:ZHANG Hao-yang, born in 1994, postgraduate.His main research interests include natural language processing and so on.

摘要: 针对文本聚类过程中簇的数量无法动态改变及文本分类结果不够精确等问题,文中引入并改进了成长型分级自组织映射(Growing Hierarchical Self-Organizing Map,GHSOM)算法,以提高文本聚类的精确度,并尝试使用改进后的GHSOM算法构建民航航空法规知识地图。GHSOM算法为多层分级结构,每一层包含数个独立的成长型SOM,通过增长规模来在一定程度上更加详细地描述数据集,提高分类效果。在此基础上,以民用航空领域的各项法律、法规条文为样本资料集,结合中文分词、关键词提取、文件向量等技术手段,利用改进的GHSOM算法对文本进行聚类分析,并最终完成民航航空法规知识地图的构建。实验结果表明,所提算法具有显著的文本聚类能力,利用该算法构建的民航航空法规知识地图取得了较好的分类效果,其精确度、召回率等评价指标也获得了进一步的提升。

关键词: GHSOM, word2vec, 文本聚类, 知识地图, 自然语言处理

Abstract: Aiming at the problems that the number of clusters cannot be dynamically changed and the text classification results are not accurate enough during the text clustering process,this paper introduces and improves the Growing Hierarchical Self-Organizing Map (GHSOM) algorithm to improve text clustering accuracy,and tries to use the improved GHSOM algorithm to build a knowledge map of civil aviation regulations.The GHSOM algorithm has a multi-level hierarchical structure,and each layer contains several independent growing SOMs.Through the growth of the scale,the data set is described in more detail to a certain extent,and the classification effect is improved.Based on this,taking various laws and regulations in the field of civil aviation as the sample data set,combined with Chinese word segmentation,keyword extraction,file vector and other technical means,the text is clustered and analyzed using the improved GHSOM algorithm,and finally the construction of civil aviation regulation knowledge map is completed.Experimental results show that the proposed algorithm has significant text clustering ability.The civil aviation regulation knowledge map constructed by this algorithm has achieved good classification results,and its evaluation indicators such as accuracy and recall rate have been further improved.

Key words: GHSOM, Knowledge map, Natural language processing, Text clustering, word2vec

中图分类号: 

  • TP391.1
[1] CHEN Q,LIAO K J,XI J Q.Research status and prospect of knowledge map.Intelligence magazine,2006,25(5):43-46.
[2] DAVENPORT T,PRUSAK L.Working knowledge:how orga
[3] nizations manage what they know.Boston:Harvard Busi-ness School Press,2008.
[4] EPPLER M J.Toward a pragmatic taxonomy of knowledge
[5] map-s:classification principles,sample typologies and applicationexamples//Tenth International Conference on InformationVisualization.2014.
[6] YE F B,TANG R Z.Knowledge Matching Method of Product Design Process Based on Knowledge Map.Journal of ZheJiang University:Engineering Science,2008,42(6):927-932.
[7] PAN C,WANG J,LIU L.A Knowledge Map Model Based on Concept Clustering.Systems Engineering-Theory & Practice,2007(2):126-132.
[8] LI M,LU X Z,CHEN L S,et al.Knowledge map construction for question and answer archives.Expert Systems With Applications,2020,141:112923.
[9] CHA W W.Disciplinary knowledge representation and quantification method of multiple independent parameters in knowledge maps .Wuhan:Central China Normal University,2017.
[10] FAN Y X.Design and implementation of personalized learning system based on knowledge map .Changsha:Hunan Normal University,2018.
[11] ZHAO T,YU L,ZHAO Q.Research on Digital Education Resource Service Model Based on Knowledge Map .Journal of Southwest China Normal University (Natural Science Edition),2019,44(11):136-141.
[12] BU W X.Based on improved GHSOM intrusion detection technology .TianJin:TianJin University,2016.
[13] CHEN L.Application of Improved GHSOM Algorithm in Text Clustering.Computer and Telecom,2016(5):57-61.
[14] TIAN W F.A Method of Feature Selection Based on Word2Vec in Text Categorization.https://kns.cnki.net/KCMS/detail/detail.aspx?dbcode=CPFD&dbname=CPFDLAST2018&filename=KZLL201807006157&v=MTc5MJNIWXJHNEg5bk-1xSTlGWXVvS0N4Tkt1aGRobmo5OFRuanFxeGRFZU1PVU-tyaWZaZVp2RUNubFU3Zk5KbG9VTGpm.
[15] XIE Z L,LI N,ZHOU C J.Research on Emotion Classification of Hotel Reviews Based on Word2vec.Journal of BeiJing Union University,2018,32(4):34-39.
[16] ZHAO Z B,SHI Y X,LI B Y.Newly-emerging Domain Word Detection Method Based on Syntactic Analysis and Term Vector.Computer Science,2019,46(6):29-34.
[17] LIU W J,LUO J X.Image Retrieval Based on Improved GHSOM Clustering Algorithm.Journal of East China University of Science and Technology(Natural Science),2015,41(2):216-221.
[1] 闫佳丹, 贾彩燕.
基于双图神经网络信息融合的文本分类方法
Text Classification Method Based on Information Fusion of Dual-graph Neural Network
计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042
[2] 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木.
中文预训练模型研究进展
Advances in Chinese Pre-training Models
计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018
[3] 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚.
融合双向门控循环单元和注意力机制的软件自承认技术债识别方法
Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism
计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075
[4] 李小伟, 舒辉, 光焱, 翟懿, 杨资集.
自然语言处理在简历分析中的应用研究综述
Survey of the Application of Natural Language Processing for Resume Analysis
计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134
[5] 张虎, 柏萍.
融入句子中远距离词语依赖的图卷积短文本分类方法
Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification
计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062
[6] 陈志毅, 隋杰.
基于DeepFM和卷积神经网络的集成式多模态谣言检测方法
DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection
计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007
[7] 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓.
基于深度学习的民事案件判决结果分类方法研究
Study on Judicial Data Classification Method Based on Natural Language Processing Technologies
计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130
[8] 裴莹, 李天祥, 王鏖清, 付加胜, 韩霄松.
基于新闻的国际天然气价格趋势预测方法
Prediction Method of International Natural Gas Price Trends Based on News
计算机科学, 2021, 48(6A): 235-239. https://doi.org/10.11896/jsjkx.201000056
[9] 吴俣, 李舟军.
检索式聊天机器人技术综述
Survey on Retrieval-based Chatbots
计算机科学, 2021, 48(12): 278-285. https://doi.org/10.11896/jsjkx.210900250
[10] 仝鑫, 王斌君, 王润正, 潘孝勤.
面向自然语言处理的深度学习对抗样本综述
Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing
计算机科学, 2021, 48(1): 258-267. https://doi.org/10.11896/jsjkx.200500078
[11] 陆龙龙, 陈统, 潘敏学, 张天.
CodeSearcher:基于自然语言功能描述的代码查询
CodeSearcher:Code Query Using Functional Descriptions in Natural Languages
计算机科学, 2020, 47(9): 1-9. https://doi.org/10.11896/jsjkx.191200170
[12] 田野, 寿黎但, 陈珂, 骆歆远, 陈刚.
基于字段嵌入的数据库自然语言查询接口
Natural Language Interface for Databases with Content-based Table Column Embeddings
计算机科学, 2020, 47(9): 60-66. https://doi.org/10.11896/jsjkx.190800138
[13] 张迎, 张宜飞, 王中卿, 王红玲.
基于主次关系特征的自动文摘方法
Automatic Summarization Method Based on Primary and Secondary Relation Feature
计算机科学, 2020, 47(6A): 6-11. https://doi.org/10.11896/JsJkx.191000007
[14] 吴小坤, 赵甜芳.
自然语言处理技术在社会传播学中的应用研究和前景展望
Application of Natural Language Processing in Social Communication:A Review and Future Perspectives
计算机科学, 2020, 47(6): 184-193. https://doi.org/10.11896/jsjkx.191200151
[15] 胡超文, 杨亚连, 邬昌兴.
基于深度学习的隐式篇章关系识别综述
Survey of Implicit Discourse Relation Recognition Based on Deep Learning
计算机科学, 2020, 47(4): 157-163. https://doi.org/10.11896/jsjkx.190300115
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!