计算机科学 ›› 2022, Vol. 49 ›› Issue (11A): 211100119-6.doi: 10.11896/jsjkx.211100119

• 人工智能 • 上一篇    下一篇

基于深度学习与文本计量的技术趋势分析

韦入铭1, 陈若愚1, 李晗1, 刘旭红1,2   

  1. 1 北京信息科技大学数据科学与情报分析研究所 北京 100101
    2 北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101
  • 出版日期:2022-11-10 发布日期:2022-11-21
  • 通讯作者: 陈若愚(ruoyu-chen@foxmail.com)
  • 作者简介:(ruming_wei@163.com)
  • 基金资助:
    北京信息科技大学勤信人才项目(2021);促进高校分类发展-重点研究培育项目——适应智慧城市应用场景的本体深度信念网络模型构建研究(2121YJPY225);科研机构创新能力建设-数据科学与情报分析研究所;促进高校内涵发展——面向边缘计算的创新科研平台建设项目(2020KYNH105)

Analysis of Technology Trends Based on Deep Learning and Text Measurement

WEI Ru-ming1, CHEN Ruo-yu1, LI Han1, LIU Xu-hong1,2   

  1. 1 Laboratory of Data Science and Information Studies,Beijing Information Science and Technology University,Beijing 100101,China
    2 Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China
  • Online:2022-11-10 Published:2022-11-21
  • About author:WEI Ru-ming,born in 1997,postgraduate.His main research interests include natural language processing and know-ledge graph.
    CHEN Ruo-yu,born in 1982,Ph.D,lecturer,is a member of China Computer Federation.His main research interests include natural language processing,data mining,semantic network and so on.
  • Supported by:
    Qin Xin Talents Cultivation Program,Beijing Information Science & Technology University(2021),Promoting the Development of University Classification-key Research and Cultivation Projects-research on the Construction of an Ontology Deep Belief Network Model Suitable for Smart City Application Scenarios(2121YJPY225),Innovation Capacity Building of Scientific Research Institutions-Institute of Data Science and Information Analysis,Promote the Development of the Connotation of Colleges and Universities-an Innovative Scientific Research Platform Construction Project for Edge Computing(2020KYNH105).

摘要: 传统的技术趋势分析工作需要由经验丰富的从业者完成,涉及到大量的文献调研和分析,工作耗时耗力。针对上述问题,提出一种基于深度学习与文本计量的技术趋势分析模型,设计基于BERT_BiLSTM_CRF模型的领域文献命名实体识别算法,优化BERT的掩码机制。以集成电路领域的新闻和论文为数据集,开展BiLSTM_CRF、BERT_BiGRU_CRF等模型以及文中所提BERT_BiLSTM_CRF*模型的对比研究,研究命名实体识别技术在集成电路等领域的数据识别效果。相比于其他算法,文章所提的领域文献命名实体识别算法在F1值上达到了88.6%,奠定了技术趋势分析的基础。基于知识图谱易表达关联关系的特点,创新性提出知识图谱与文本计量技术结合的方法,并从不同角度以可视化的形式展示技术趋势分析效果,最终辅助从业者开展技术趋势智能分析工作。

关键词: 命名实体识别, 知识图谱, BERT_BiLSTM_CRF, 文本计量, 技术趋势分析

Abstract: Traditionally,technical trend analysis tasks need to be done by experienced analysts,involving a lot of literature review and data analysis work,which is time-consuming and labor-intensive.Facing the above problems,this paper proposes a technology trend analysis model based on deep learning and text measurement,and a domain specific named entity recognition(NER) algorithm based on the BERT_BiLSTM_CRF model is designed with optimized masking mechanism.Taking news and literatures texts in the field of integrated circuit as data set,a comparative study between BiLSTM_CRF,BERT_BiGRU_CRF and the optimized BERT_BiLSTM_CRF* model proposed in this paper is carried out.The performance of NER is compared and analyzed.Compared with other algorithms,the proposed algorithm reaches 88.6%(measured by F1 value),laying the foundation for technical trend analysis.Based on the characteristics of knowledge graphs that relationships can be naturally expressed,an innovative method that combines knowledge graphs with text measurement technology is proposed,and the results of technical trend analysis are visualized from various perspectives,and ultimately assist analysts to carry out intelligent analysis of technical trends.

Key words: Named entity recognition, Knowledge graph, BERT_BiLSTM_CRF, Text measurement, Technology trend analysis

中图分类号: 

  • TP391
[1]WANG C Y.Analysis on the development trend of anti-virus technology [N].Network World,2005-12-12(035).
[2]GUPTA R,PAL S K.Trend Analysis and Forecasting of COVID-19 outbreak in India[J].MedRxiv,2020.
[3]LI S W.Analysis of the hot spots and development trend ofcommunication technology in the era of big data[J].Information Recording Materials,2021,22(7):62-64.
[4]GRISHMAN R,SUNDHEIM B.Message Understanding Conference 6:A Brief History[C]//Proceedings of the 16th International Conference on Computational Linguistics.1996.
[5]ZHAO S,LIU T,ZHAO S,et al.A neural multi-task learning framework to jointly model medical named entity recognition and normalization[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2019:817-824.
[6]XUEZHEN Y I N,HUI Z,JUNBAO Z,et al.Multi-neural network collaboration for Chinese military named entity recognition[J].Journal of Tsinghua University(Science and Technology),2020,60(8):648-655.
[7]XIE R,LIU Z,JIA J,et al.Representation Learning of Knowledge Graphs with Entity Descriptions[C]//Thirtieth AAAI Conference on Artificial lntelligence.2016.
[8]LAFFERTY J,MCCALLUM A,PEREIRA F C N.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence data[C]//Proceedings of the 18th International Conference on Machine Learning 2001(ICML 2001).2001:282-289.
[9]ZHAO S,CAI Z,CHEN H,et al.Adversarial training based lattice LSTM for Chinese clinical named entity recognition[J].Journal of Biomedical Informatics,2019,99:103290.
[10]WU F,LIU J,WU C,et al.Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation[C]//The World Wide Web Conference.2019:3342-3348.
[11]GAO X,LI Q.Named entity recognition in material field based on Bert-BILSTM-Attention-CRF[C]//2021 IEEE Conference on Telecommunications,Optics and Computer Science(TOCS).IEEE,2021:955-958.
[12]GUO Z X,DENG X L.The entity intelligent identification method of legal cases based on BERT-BiLSTM-CRF[J].Journal of Beijing University of Posts and Telecommunications,2021,44(4):129-134.
[13]GU Y.Research on Complex Chinese Named Entity Recognition Based onBiLSTM-CRF [D].Nanjing:Nanjing University,2019.
[14]HU H,DENG S,LU H,et al.A Comparative Study on the Classification Performance of Machine Learning Models for AcademicFull Texts[C]//International Conference on Information.Cham:Springer,2020:713-737.
[15]TIAN LX.Summary of Research on Knowledge Graph[J].Software,2020,41(4):67-71.
[16]LIU S H,LIU X H,LIU X L,et al.Extraction of coal mine safety accident ontology concept based on word vector and conditional random field[J].Coal Technology,2018,37(9):178-181.
[17]GOLDBERG Y,LEVY O.word2vec Explained:deriving Miko-lov et al.’s negative-sampling word-embedding method[J].ar-Xiv:1402.3722,2014.
[1] 徐涌鑫, 赵俊峰, 王亚沙, 谢冰, 杨恺.
时序知识图谱表示学习
Temporal Knowledge Graph Representation Learning
计算机科学, 2022, 49(9): 162-171. https://doi.org/10.11896/jsjkx.220500204
[2] 饶志双, 贾真, 张凡, 李天瑞.
基于Key-Value关联记忆网络的知识图谱问答方法
Key-Value Relational Memory Networks for Question Answering over Knowledge Graph
计算机科学, 2022, 49(9): 202-207. https://doi.org/10.11896/jsjkx.220300277
[3] 吴子仪, 李邵梅, 姜梦函, 张建朋.
基于自注意力模型的本体对齐方法
Ontology Alignment Method Based on Self-attention
计算机科学, 2022, 49(9): 215-220. https://doi.org/10.11896/jsjkx.210700190
[4] 孔世明, 冯永, 张嘉云.
融合知识图谱的多层次传承影响力计算与泛化研究
Multi-level Inheritance Influence Calculation and Generalization Based on Knowledge Graph
计算机科学, 2022, 49(9): 221-227. https://doi.org/10.11896/jsjkx.210700144
[5] 秦琪琦, 张月琴, 王润泽, 张泽华.
基于知识图谱的层次粒化推荐方法
Hierarchical Granulation Recommendation Method Based on Knowledge Graph
计算机科学, 2022, 49(8): 64-69. https://doi.org/10.11896/jsjkx.210600111
[6] 王杰, 李晓楠, 李冠宇.
基于自适应注意力机制的知识图谱补全算法
Adaptive Attention-based Knowledge Graph Completion
计算机科学, 2022, 49(7): 204-211. https://doi.org/10.11896/jsjkx.210400129
[7] 马瑞新, 李泽阳, 陈志奎, 赵亮.
知识图谱推理研究综述
Review of Reasoning on Knowledge Graph
计算机科学, 2022, 49(6A): 74-85. https://doi.org/10.11896/jsjkx.210100122
[8] 邓凯, 杨频, 李益洲, 杨星, 曾凡瑞, 张振毓.
一种可快速迁移的领域知识图谱构建方法
Fast and Transmissible Domain Knowledge Graph Construction Method
计算机科学, 2022, 49(6A): 100-108. https://doi.org/10.11896/jsjkx.210900018
[9] 杜晓明, 袁清波, 杨帆, 姚奕, 蒋祥.
军事指控保障领域命名实体识别语料库的构建
Construction of Named Entity Recognition Corpus in Field of Military Command and Control Support
计算机科学, 2022, 49(6A): 133-139. https://doi.org/10.11896/jsjkx.210400132
[10] 熊中敏, 舒贵文, 郭怀宇.
融合用户偏好的图神经网络推荐模型
Graph Neural Network Recommendation Model Integrating User Preferences
计算机科学, 2022, 49(6): 165-171. https://doi.org/10.11896/jsjkx.210400276
[11] 钟将, 尹红, 张剑.
基于学术知识图谱的辅助创新技术研究
Academic Knowledge Graph-based Research for Auxiliary Innovation Technology
计算机科学, 2022, 49(5): 194-199. https://doi.org/10.11896/jsjkx.210400195
[12] 朱敏, 梁朝晖, 姚林, 王翔坤, 曹梦琦.
学术引用信息可视化方法综述
Survey of Visualization Methods on Academic Citation Information
计算机科学, 2022, 49(4): 88-99. https://doi.org/10.11896/jsjkx.210300219
[13] 梁静茹, 鄂海红, 宋美娜.
基于属性图模型的领域知识图谱构建方法
Method of Domain Knowledge Graph Construction Based on Property Graph Model
计算机科学, 2022, 49(2): 174-181. https://doi.org/10.11896/jsjkx.210500076
[14] 韩啸, 章哲庆, 严丽.
基于关系数据库的时态RDF建模
Temporal RDF Modeling Based on Relational Database
计算机科学, 2022, 49(11): 90-97. https://doi.org/10.11896/jsjkx.211100065
[15] 邓亮, 曹存根.
一种专利知识图谱的构建方法
Methods of Patent Knowledge Graph Construction
计算机科学, 2022, 49(11): 185-196. https://doi.org/10.11896/jsjkx.211100063
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!