计算机科学 ›› 2024, Vol. 51 ›› Issue (6A): 230800150-7.doi: 10.11896/jsjkx.230800150

• 大数据&数据科学 • 上一篇    下一篇

基于本体驱动的航空情报表格信息结构化研究

赖欣, 李思宁, 梁昌盛, 张恒嫣   

  1. 中国民用航空飞行学院 四川 广汉 618307
  • 发布日期:2024-06-06
  • 通讯作者: 李思宁(1825614449@qq.com)
  • 作者简介:(lxrzg@163.com)
  • 基金资助:
    四川省自然科学基金(2023NSFSC0903);中央高校校级重点项目(ZJ2023-003)

Ontology-driven Study on Information Structuring of Aeronautical Information Tables

LAI Xin, LI Sining, LIANG Changsheng, ZHANG Hengyan   

  1. Civil Aviation Flight University of China,Guanghan,Sichuan 618307,China
  • Published:2024-06-06
  • About author:LAI Xin,born in 1977,Ph.D,associate professor.Her main research interest is aeronautical information services and management.
    LI Sining,born in 1998,postgraduate.Her main research interest is traffic and transportation.
  • Supported by:
    Natural Science Foundation of Sichuan Province,China(2023NSFSC0903) and Key Program of the Central Universities at the School Level(ZJ2023-003).

摘要: 航空资料汇编是国际民航组织推荐的呈现各国航空信息的主要载体,其中以表格数据形式汇总了大量航空数据与航空运行限制信息。为实现航空汇编资料的智能查询,以及对航空资料汇编中静态数据的挖掘与利用,需要对航空汇编资料中的表格信息予以特征提取与结构化处理。将航空资料汇编中表格信息作为研究对象,提出了一种基于本体驱动的航空情报表格信息结构化抽取方法。首先构建航空情报领域信息的本体框架,实现对领域知识统一规范的描述;其次,利用Document AI对表格文档的布局结构进行研究与预处理,并利用随机森林算法与条件随机场模型进行特征实体提取验证与分析。实验结果表明,所提方法能够有效提取航空情报表格中的特征实体,为航空情报领域静态数据深入挖掘提供参考。

关键词: 航空情报, 本体, 命名实体识别, 条件随机场, 随机森林, Document AI

Abstract: The aeronautical information publication(AIP) is the main carrier recommended by ICAO to present aeronautical information of all countries,in which a large amount of aeronautical data and aeronautical operation restriction information exists in the form of table information.In order to achieve intelligent querying of AIP and to facilitate the extraction and utilization of static data within it,it is necessary to perform feature extraction and structural processing on the tabular information within AIP.In this paper,an ontology-driven structured extraction method for aeronautical information tabular data is proposed,taking tabular data in AIP as the research object.Firstly,the ontology framework of aeronautical information is constructed to realize a unified and standardized description of domain knowledge.Secondly,the layout structure of form documents is studied and preprocessed using Document AI,and the feature entity extraction is verified and analyzed using random forest algorithm and conditional random field model(CRF).Experimental results show that the proposed method can effectively extract the feature entities in AIP,and provide reference for the in-depth mining of static data in the field of aeronautical information.

Key words: Aeronautical Information, Ontology, Named entity recognition, Conditional random field model, Random forest, Document AI

中图分类号: 

  • TP391
[1]CUI L,XU Y H,LV T C,et al.Document AI:Benchmarks,Models and Applications[J].Journal of Chinese Information Processing,2022,36(6):1-19.
[2]SUN S D.Research on semantics knowledge organization of historical newspaper resources in digital humanities Research on Sem[D].Jilin:Jilin University,2022.
[3]ZHANG Y T,LI Q Y,LIU S K.Tabular subordination relation extraction based on graph convolutional network[J].Journal of Beijing University of Aeronautics and Astronautics,2024,50(4):1308-1315.
[4]TANG R,DENG J X,YE Z X,et al.Survey of Table Extraction in PDF Documents[J].Computer Applications and Software,2021,38(7):1-7,22.
[5]SHEN Y F.Construction and Intelligent Application of PublicSecurity Knowledge Graph Model Based on Multi-source Hete-rogeneous Data[J].Police Science Research,2021(5):79-89.
[6]YU F.Methodothology and empirical research on Domain Ontology-A case of Geomatics[D].Wuhan:Wuhan University,2013.
[7]LI A H,XU Y Z,CHI Y X.Review of Ontology Construction and Applications[J/OL].Information Studies:Theory & Application:1-9[2023-08-09].
[8]WANG Y L,ZOU J F,WANG K,et al.Injection MoldingKnowledge Graph Based on Ontology Guidance and its Application to Quality Diagnosis[J].Journal of Electronics & Information Technology,2022,44(5):1521-1529.
[9]TANG A M,ZHEN Q,FAN J.Thesaurus-based Approach to Build Domain Ontology[J].Data Analysis and Knowledge Discovery,2005(4):1-5.
[10]ZHOU Y W,YANG C H,WANG H Y.Ontology construction of military field[J].Computer Era,2022(9):96-99.
[11]DING S C,FU Z.Research on Semi-automatic Construction of Domain Ontology Based on Space Thesaurus[J].Information Studies:Theory & Application,2011,34(11):113-116.
[12]SUN X,REN X Y,ZHENG H C,et al.Domain Named Entity Recognition Method Based on Parameter Transfer Learning[J].Technology Intelligence Engineering,2022,8(3):13-27.
[13]YANG X W,YUMER E,ASENTE P,et al.Learning to extract semantic structure from documents using multimodal fully con-volutional neural networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR).2017:4342-4351.
[14]PRASAD D,GADPAL A,KAPADNI K,et al.Cascadetabnet:An approach for end to end table detection and structure recognition from imagebased documents[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.2020:572-573.
[15]FENG Y T,ZHANG H J,HAO W N.Named Entity Recognition for Military Text[J].Computer Science,2015,42(7):15-18,47.
[16]GAO X,TANG J Q,ZHU J W,et al.Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement[J].Computer Science,2023,50(S1):112-117.
[17]KRUENGKRAI C,NGUYENT H,ALJUNIED S M,et al.Improving LowResource Named Entity Recognitionusing Joint Sentenceand Token Labeling[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:5898-5905.
[18]ZHU R,YANG L C,DING W X,et al.Study on Named Entity Recognition Method Based on Knowledge Graph Enhancement[J].Journal of Central Journal of Central China Normal University(Natural Sciences),2018,52(3):316-321.
[19]LIU P F,QIAN L,ZHAO X W,et al.Continual learning framework of named entity recognition in aviation assembly domain[J].Journal of Zhejiang University(Engineering Science),2023,57(6):1186-1194,1266.
[20]LIN B,WU S B,ZOU Y J,et al.Individual Travel Behavior Prediction of Hong Kong-Zhuhai-Macao Bridge Based on Combinatio on of BLSMOTE Algorithm and Random Forest Model[J].Traffic & Transportation,2023,39(2):37-43.
[21]GAO X,WANG S,ZHU J W,et al.Overview of Named Entity Recognition Tasks[J].Computer Science,2023,50(S1):26-33.
[22]YANG Z W.Research on Named Entity Recognition Methods for Unstructured Text[D].Jilin:Jilin University,2023.
[23]XU M X.Application of named entity recognition technology in epidemiological investigation[D].Guizhou:Guizhou Normal University,2022.
[24]KRUENGKRAI C,NGUYENT H,ALJUNIED S M,et al.Improving LowResource Named Entity Recognitionusing Joint Sentenceand Token Labeling[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:5898-5905.
[25]WANG J,SHOU L,CHEN K,et al.Pyramid:A layered model for nested named entity recognition[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.2020:5918-5928.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!