计算机科学 ›› 2020, Vol. 47 ›› Issue (6A): 429-435.doi: 10.11896/JsJkx.190700161
张浩洋, 周良
ZHANG Hao-yang and ZHOU Liang
摘要: 针对文本聚类过程中簇的数量无法动态改变及文本分类结果不够精确等问题,文中引入并改进了成长型分级自组织映射(Growing Hierarchical Self-Organizing Map,GHSOM)算法,以提高文本聚类的精确度,并尝试使用改进后的GHSOM算法构建民航航空法规知识地图。GHSOM算法为多层分级结构,每一层包含数个独立的成长型SOM,通过增长规模来在一定程度上更加详细地描述数据集,提高分类效果。在此基础上,以民用航空领域的各项法律、法规条文为样本资料集,结合中文分词、关键词提取、文件向量等技术手段,利用改进的GHSOM算法对文本进行聚类分析,并最终完成民航航空法规知识地图的构建。实验结果表明,所提算法具有显著的文本聚类能力,利用该算法构建的民航航空法规知识地图取得了较好的分类效果,其精确度、召回率等评价指标也获得了进一步的提升。
中图分类号:
[1] CHEN Q,LIAO K J,XI J Q.Research status and prospect of knowledge map.Intelligence magazine,2006,25(5):43-46. [2] DAVENPORT T,PRUSAK L.Working knowledge:how orga [3] nizations manage what they know.Boston:Harvard Busi-ness School Press,2008. [4] EPPLER M J.Toward a pragmatic taxonomy of knowledge [5] map-s:classification principles,sample typologies and applicationexamples//Tenth International Conference on InformationVisualization.2014. [6] YE F B,TANG R Z.Knowledge Matching Method of Product Design Process Based on Knowledge Map.Journal of ZheJiang University:Engineering Science,2008,42(6):927-932. [7] PAN C,WANG J,LIU L.A Knowledge Map Model Based on Concept Clustering.Systems Engineering-Theory & Practice,2007(2):126-132. [8] LI M,LU X Z,CHEN L S,et al.Knowledge map construction for question and answer archives.Expert Systems With Applications,2020,141:112923. [9] CHA W W.Disciplinary knowledge representation and quantification method of multiple independent parameters in knowledge maps .Wuhan:Central China Normal University,2017. [10] FAN Y X.Design and implementation of personalized learning system based on knowledge map .Changsha:Hunan Normal University,2018. [11] ZHAO T,YU L,ZHAO Q.Research on Digital Education Resource Service Model Based on Knowledge Map .Journal of Southwest China Normal University (Natural Science Edition),2019,44(11):136-141. [12] BU W X.Based on improved GHSOM intrusion detection technology .TianJin:TianJin University,2016. [13] CHEN L.Application of Improved GHSOM Algorithm in Text Clustering.Computer and Telecom,2016(5):57-61. [14] TIAN W F.A Method of Feature Selection Based on Word2Vec in Text Categorization.https://kns.cnki.net/KCMS/detail/detail.aspx?dbcode=CPFD&dbname=CPFDLAST2018&filename=KZLL201807006157&v=MTc5MJNIWXJHNEg5bk-1xSTlGWXVvS0N4Tkt1aGRobmo5OFRuanFxeGRFZU1PVU-tyaWZaZVp2RUNubFU3Zk5KbG9VTGpm. [15] XIE Z L,LI N,ZHOU C J.Research on Emotion Classification of Hotel Reviews Based on Word2vec.Journal of BeiJing Union University,2018,32(4):34-39. [16] ZHAO Z B,SHI Y X,LI B Y.Newly-emerging Domain Word Detection Method Based on Syntactic Analysis and Term Vector.Computer Science,2019,46(6):29-34. [17] LIU W J,LUO J X.Image Retrieval Based on Improved GHSOM Clustering Algorithm.Journal of East China University of Science and Technology(Natural Science),2015,41(2):216-221. |
[1] | 闫佳丹, 贾彩燕. 基于双图神经网络信息融合的文本分类方法 Text Classification Method Based on Information Fusion of Dual-graph Neural Network 计算机科学, 2022, 49(8): 230-236. https://doi.org/10.11896/jsjkx.210600042 |
[2] | 侯钰涛, 阿布都克力木·阿布力孜, 哈里旦木·阿布都克里木. 中文预训练模型研究进展 Advances in Chinese Pre-training Models 计算机科学, 2022, 49(7): 148-163. https://doi.org/10.11896/jsjkx.211200018 |
[3] | 熊罗庚, 郑尚, 邹海涛, 于化龙, 高尚. 融合双向门控循环单元和注意力机制的软件自承认技术债识别方法 Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism 计算机科学, 2022, 49(7): 212-219. https://doi.org/10.11896/jsjkx.210500075 |
[4] | 李小伟, 舒辉, 光焱, 翟懿, 杨资集. 自然语言处理在简历分析中的应用研究综述 Survey of the Application of Natural Language Processing for Resume Analysis 计算机科学, 2022, 49(6A): 66-73. https://doi.org/10.11896/jsjkx.210600134 |
[5] | 张虎, 柏萍. 融入句子中远距离词语依赖的图卷积短文本分类方法 Graph Convolutional Networks with Long-distance Words Dependency in Sentences for Short Text Classification 计算机科学, 2022, 49(2): 279-284. https://doi.org/10.11896/jsjkx.201200062 |
[6] | 陈志毅, 隋杰. 基于DeepFM和卷积神经网络的集成式多模态谣言检测方法 DeepFM and Convolutional Neural Networks Ensembles for Multimodal Rumor Detection 计算机科学, 2022, 49(1): 101-107. https://doi.org/10.11896/jsjkx.201200007 |
[7] | 王立梅, 朱旭光, 汪德嘉, 张勇, 邢春晓. 基于深度学习的民事案件判决结果分类方法研究 Study on Judicial Data Classification Method Based on Natural Language Processing Technologies 计算机科学, 2021, 48(8): 80-85. https://doi.org/10.11896/jsjkx.210300130 |
[8] | 裴莹, 李天祥, 王鏖清, 付加胜, 韩霄松. 基于新闻的国际天然气价格趋势预测方法 Prediction Method of International Natural Gas Price Trends Based on News 计算机科学, 2021, 48(6A): 235-239. https://doi.org/10.11896/jsjkx.201000056 |
[9] | 吴俣, 李舟军. 检索式聊天机器人技术综述 Survey on Retrieval-based Chatbots 计算机科学, 2021, 48(12): 278-285. https://doi.org/10.11896/jsjkx.210900250 |
[10] | 仝鑫, 王斌君, 王润正, 潘孝勤. 面向自然语言处理的深度学习对抗样本综述 Survey on Adversarial Sample of Deep Learning Towards Natural Language Processing 计算机科学, 2021, 48(1): 258-267. https://doi.org/10.11896/jsjkx.200500078 |
[11] | 陆龙龙, 陈统, 潘敏学, 张天. CodeSearcher:基于自然语言功能描述的代码查询 CodeSearcher:Code Query Using Functional Descriptions in Natural Languages 计算机科学, 2020, 47(9): 1-9. https://doi.org/10.11896/jsjkx.191200170 |
[12] | 田野, 寿黎但, 陈珂, 骆歆远, 陈刚. 基于字段嵌入的数据库自然语言查询接口 Natural Language Interface for Databases with Content-based Table Column Embeddings 计算机科学, 2020, 47(9): 60-66. https://doi.org/10.11896/jsjkx.190800138 |
[13] | 张迎, 张宜飞, 王中卿, 王红玲. 基于主次关系特征的自动文摘方法 Automatic Summarization Method Based on Primary and Secondary Relation Feature 计算机科学, 2020, 47(6A): 6-11. https://doi.org/10.11896/JsJkx.191000007 |
[14] | 吴小坤, 赵甜芳. 自然语言处理技术在社会传播学中的应用研究和前景展望 Application of Natural Language Processing in Social Communication:A Review and Future Perspectives 计算机科学, 2020, 47(6): 184-193. https://doi.org/10.11896/jsjkx.191200151 |
[15] | 胡超文, 杨亚连, 邬昌兴. 基于深度学习的隐式篇章关系识别综述 Survey of Implicit Discourse Relation Recognition Based on Deep Learning 计算机科学, 2020, 47(4): 157-163. https://doi.org/10.11896/jsjkx.190300115 |
|